Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published 5 days ago • 20
An Empirical Study of Autoregressive Pre-training from Videos Paper • 2501.05453 • Published 3 days ago • 29
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 3 days ago • 54
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published 26 days ago • 13
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers Paper • 2412.12571 • Published 27 days ago • 8
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published 25 days ago • 24
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities Paper • 2412.14123 • Published 25 days ago • 11
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Paper • 2412.12953 • Published 26 days ago • 11
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published 25 days ago • 18
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 25 days ago • 49
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 25 days ago • 121
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 27 days ago • 41
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 86
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published 27 days ago • 23
Smaller Language Models Are Better Instruction Evolvers Paper • 2412.11231 • Published 28 days ago • 27
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published Dec 10, 2024 • 35
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper • 2412.11919 • Published 27 days ago • 33
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance Paper • 2412.06673 • Published Dec 9, 2024 • 11