ViTPose++: Vision Transformer for Generic Body Pose Estimation Paper • 2212.04246 • Published Dec 7, 2022 • 1
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 146
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper • 2408.02442 • Published Aug 5, 2024 • 21
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising Paper • 2402.18842 • Published Feb 29, 2024 • 15