V_1: Unifying Generation and Self-Verification for Parallel Reasoners Paper • 2603.04304 • Published 2 days ago • 11
SLA2: Sparse-Linear Attention with Learnable Routing and QAT Paper • 2602.12675 • Published 21 days ago • 53
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 21 days ago • 43
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization Paper • 2602.02958 • Published Feb 3 • 33
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow Paper • 2601.14243 • Published Jan 20 • 23
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation Paper • 2512.05033 • Published Dec 4, 2025 • 17
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs Paper • 2510.04767 • Published Oct 6, 2025 • 28
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published Oct 22, 2025 • 61
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28, 2025 • 118
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published May 16, 2025 • 75