Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published 19 days ago • 52
DNA-R1 Collection Reasoning model distilled from DeepSeek-R1, enhanced with GRPO using supplementary reasoning datasets. • 1 item • Updated 5 days ago • 2
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 21 items • Updated 7 days ago • 128
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 143