rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published 4 days ago β’ 190
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper β’ 2501.03262 β’ Published 9 days ago β’ 72
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper β’ 2402.03300 β’ Published Feb 5, 2024 β’ 77
view article Article Fine-tune ModernBERT for text classification using synthetic data By davidberenstein1957 β’ 13 days ago β’ 22
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper β’ 2412.18319 β’ Published 19 days ago β’ 35
DeTikZify Collection Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ β’ 11 items β’ Updated Dec 4, 2024 β’ 7
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper β’ 2412.06559 β’ Published Dec 9, 2024 β’ 74
view article Article Rethinking Backpropagation: Thoughts on What's Wrong with Backpropagation By Jaward β’ Dec 2, 2024 β’ 5
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper β’ 2412.05271 β’ Published Dec 6, 2024 β’ 124
view article Article πΊπ¦ββ¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs By wolfram β’ Dec 4, 2024 β’ 76
Cut Your Losses in Large-Vocabulary Language Models Paper β’ 2411.09009 β’ Published Nov 13, 2024 β’ 43
Thinking LLMs: General Instruction Following with Thought Generation Paper β’ 2410.10630 β’ Published Oct 14, 2024 β’ 18
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais β’ Nov 13, 2024 β’ 98
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning Paper β’ 2410.02089 β’ Published Oct 2, 2024 β’ 12
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper β’ 2410.02884 β’ Published Oct 3, 2024 β’ 53
view article Article Decoding Strategies in Large Language Models By mlabonne β’ Oct 29, 2024 β’ 38