Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published about 1 month ago • 137
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 43
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs Paper • 2410.16144 • Published Oct 21, 2024 • 3
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published Nov 13, 2024 • 25
Self-Boosting Large Language Models with Synthetic Preference Data Paper • 2410.06961 • Published Oct 9, 2024 • 16
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published Oct 15, 2024 • 48
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks Paper • 2410.12381 • Published Oct 16, 2024 • 43