Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Paper • 2501.14818 • Published 14 days ago • 3
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published 11 days ago • 22
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published 11 days ago • 22
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? Paper • 2501.05510 • Published 24 days ago • 39
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 139
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published Nov 26, 2024 • 79