SIGMA: An AI-Empowered Training Stack on Early-Life Hardware Paper • 2512.13488 • Published 14 days ago
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published 25 days ago • 149
Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data Paper • 2510.25804 • Published Oct 29 • 1
Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data Paper • 2510.25804 • Published Oct 29 • 1 • 1
Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data Paper • 2510.25804 • Published Oct 29 • 1
Learning from the Best, Differently: A Diversity-Driven Rethinking on Data Selection Paper • 2510.18909 • Published Oct 21 • 4
Learning from the Best, Differently: A Diversity-Driven Rethinking on Data Selection Paper • 2510.18909 • Published Oct 21 • 4
Learning from the Best, Differently: A Diversity-Driven Rethinking on Data Selection Paper • 2510.18909 • Published Oct 21 • 4 • 3
Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training Paper • 2510.08008 • Published Oct 9 • 5
Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training Paper • 2510.08008 • Published Oct 9 • 5
Behind RoPE: How Does Causal Mask Encode Positional Information? Paper • 2509.21042 • Published Sep 25 • 8
Behind RoPE: How Does Causal Mask Encode Positional Information? Paper • 2509.21042 • Published Sep 25 • 8
Behind RoPE: How Does Causal Mask Encode Positional Information? Paper • 2509.21042 • Published Sep 25 • 8 • 2
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training Paper • 2507.15640 • Published Jul 21 • 4
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training Paper • 2507.15640 • Published Jul 21 • 4
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training Paper • 2507.15640 • Published Jul 21 • 4 • 1
TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression Paper • 2506.02678 • Published Jun 3 • 5
TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression Paper • 2506.02678 • Published Jun 3 • 5 • 2