Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 12 days ago • 131
Large Action Models: From Inception to Implementation Paper • 2412.10047 • Published 12 days ago • 28
PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations Paper • 2412.05994 • Published 17 days ago • 17
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published 13 days ago • 90
MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views Paper • 2412.06767 • Published 16 days ago • 6
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published 20 days ago • 104
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Paper • 2412.02687 • Published 22 days ago • 109
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 21 days ago • 118
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing Paper • 2411.16781 • Published about 1 month ago • 10
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset Paper • 2411.15640 • Published Nov 23 • 4
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 29 days ago • 47
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22 • 42
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published Nov 5 • 64
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22 • 89
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published Oct 23 • 49
Aligning Large Language Models via Self-Steering Optimization Paper • 2410.17131 • Published Oct 22 • 21
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts Paper • 2410.10626 • Published Oct 14 • 37
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Paper • 2410.10819 • Published Oct 14 • 6