Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 45
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published Nov 13, 2024 • 27
Self-Boosting Large Language Models with Synthetic Preference Data Paper • 2410.06961 • Published Oct 9, 2024 • 16
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published Oct 15, 2024 • 49
HumanEval-V: Benchmarking High-Level Visual Reasoning with Complex Diagrams in Coding Tasks Paper • 2410.12381 • Published Oct 16, 2024 • 45
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated Paper • 2407.10969 • Published Jul 15, 2024 • 23
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 613
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 47
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Paper • 2311.16502 • Published Nov 27, 2023 • 35
VideoPoet: A Large Language Model for Zero-Shot Video Generation Paper • 2312.14125 • Published Dec 21, 2023 • 47
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 98
Kosmos-2: Grounding Multimodal Large Language Models to the World Paper • 2306.14824 • Published Jun 26, 2023 • 34