MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills? Paper • 2606.01993 • Published 3 days ago • 12
Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth Paper • 2605.25052 • Published 11 days ago • 14
DCAgent3/dev_set_v2_rl__24GPU_base_excl_timeouts__exp_rpt_pymethods2test_large__GLM_4_7_c2148a8d Viewer • Updated 8 days ago • 296 • 62 • 1
SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering Paper • 2605.17526 • Published 18 days ago • 7
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 28 days ago • 233
Forge-UGC: FX optimization and register-graph engine for universal graph compiler Paper • 2604.16498 • Published Apr 14 • 5
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326