iFSQ: Improving FSQ for Image Generation with 1 Line of Code Paper • 2601.17124 • Published 9 days ago • 31
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published 4 days ago • 34
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published 4 days ago • 95
Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models Paper • 2601.14152 • Published 13 days ago • 5
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published 12 days ago • 68
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers Paper • 2601.17367 • Published 9 days ago • 33
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers Paper • 2601.04890 • Published 25 days ago • 41
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published Dec 31, 2025 • 42