Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 5 days ago • 20
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published Mar 14 • 94
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 137
Running 547 547 Scaling test-time compute 📈 Enhance math problem solving by scaling test-time compute
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Paper • 2412.02687 • Published Dec 3, 2024 • 114
TinyFusion: Diffusion Transformers Learned Shallow Paper • 2412.01199 • Published Dec 2, 2024 • 14
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training Paper • 2411.13476 • Published Nov 20, 2024 • 16
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 82
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22, 2024 • 94
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1, 2024 • 150
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31, 2024 • 68