Spaces:
Running
Running
title: README | |
emoji: π | |
colorFrom: yellow | |
colorTo: purple | |
sdk: static | |
pinned: false | |
# Welcome to Open-R1 π³π€ | |
Open-R1 is an open initiative to replicate and extend the techniques behind DeepSeek-R1, a state-of-the-art reasoning model, in a fully transparent and collaborative way: https://github.com/huggingface/open-r1 | |
This organization is dedicated to: | |
- Sharing datasets and models built on the path to replicating DeepSeek-R1. | |
- Fostering meaningful discussions and collaboration in the **[Community tab](https://huggingface.co/spaces/open-r1/README/discussions)**. | |
By working together, we aim to create a robust foundation for reasoning models that the entire research and industry community can leverage. | |
# Plan of attack | |
We are using the DeepSeek-R1 tech report as a guide to recreate their pipeline. The work can be broken down into three main steps: | |
- Replicate R1-Distill: | |
Distill a high-quality reasoning corpus from DeepSeek-R1 to create the R1-Distill models. | |
- Recreate the pure RL pipeline: | |
Reproduce the reinforcement learning process that DeepSeek used to train R1-Zero. This will likely require curating new, large-scale datasets for math, reasoning, and code. | |
- Demonstrate end-to-end training: | |
Show that we can go from a base model to RL-tuned reasoning capabilities through a multi-stage training approach, combining supervised fine-tuning (SFT) and reinforcement learning (RL). | |
# How to contribute | |
This project thrives on community participation! Here are some ways you can contribute: | |
- Join the discussion: Share ideas, ask questions, and collaborate with others in the **[Community tab](https://huggingface.co/spaces/open-r1/README/discussions)**. | |
- Contribute code or datasets: Submit pull requests with datasets, models, or improvements to the pipeline. | |
- Experiment and share results: Try out different approaches and share your findings with the community. | |
Letβs build something impactful together. π | |