ECO: Quantized Training without Full-Precision Master Weights Paper • 2601.22101 • Published 7 days ago • 6 • 3
JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion Paper • 2601.22143 • Published 7 days ago • 5 • 3
FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning Paper • 2601.19001 • Published 10 days ago • 4 • 3
Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation Paper • 2601.21406 • Published 7 days ago • 4 • 4
FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale Paper • 2601.22146 • Published 7 days ago • 8 • 5
Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts Paper • 2601.22156 • Published 7 days ago • 10 • 4
Self-Improving Pretraining: using post-trained models to pretrain better models Paper • 2601.21343 • Published 7 days ago • 15 • 3
KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices Paper • 2601.21579 • Published 7 days ago • 6 • 5
Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units Paper • 2601.21996 • Published 7 days ago • 4 • 4
MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources Paper • 2601.22054 • Published 7 days ago • 5 • 3
One-step Latent-free Image Generation with Pixel Mean Flows Paper • 2601.22158 • Published 7 days ago • 15 • 3
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents Paper • 2601.20975 • Published 8 days ago • 9 • 3
Beyond Imitation: Reinforcement Learning for Active Latent Planning Paper • 2601.21598 • Published 7 days ago • 9 • 4
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning Paper • 2601.22069 • Published 7 days ago • 7 • 3
Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report Paper • 2601.21051 • Published 8 days ago • 12 • 3
MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models Paper • 2601.21181 • Published 7 days ago • 8 • 3
Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models Paper • 2601.18129 • Published 10 days ago • 10 • 4
Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening Paper • 2601.21590 • Published 7 days ago • 12 • 13
Language-based Trial and Error Falls Behind in the Era of Experience Paper • 2601.21754 • Published 7 days ago • 16 • 3