view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 68
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published 27 days ago • 32
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated 20 days ago • 49
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 14 items • Updated 1 day ago • 99