netease-youdao
/

Confucius-o1-14B

@@ -13,14 +13,30 @@ library_name: transformers
 ## Introduction
 **Confucius-o1-14B** is a o1-like reasoning model developed by the NetEase Youdao Team, it can be easily deployed on a single GPU without quantization. This model is based on the Qwen2.5-14B-Instruct model and adopts a two-stage learning strategy, enabling the lightweight 14B model to possess thinking abilities similar to those of o1. What sets it apart is that after generating the chain of thought, it can summarize a step-by-step problem-solving process from the chain of thought on its own. This can prevent users from getting bogged down in the complex chain of thought and allows them to easily obtain the correct problem-solving ideas and answers.
 However, there are some limitations that must be stated in advance:
 1. **Scenario Limitations**: Our optimization is only carried out on data from the K12 mathematics scenario, and the effectiveness has only been verified in math-related benchmark tests. The performance of the model in non-mathematical scenarios has not been tested, so we cannot guarantee its quality and effectiveness in other fields.
 2. **Language-related Issues**: In the “summary” block, the model has a stronger tendency to generate Chinese content. In the “thinking” block, the model may reason in an unexpected language environment or even present a mixture of languages. However, this does not affect the actual reasoning ability of the model. This indicates that the chain of thought itself may not have independent value, it is merely an easier-to-learn path leading to a correct summary.
 3. **Invalid Results**: The model may sometimes fall into circular reasoning. Since we use explicit identifiers to divide the thinking and summary parts, when the model enters this mode, it may generate invalid results that cannot be parsed.
 4. **Safety and Ethics**: This model has not undergone optimization and testing for alignment at the safety and ethical levels. Any output generated by the model does not represent the official positions, views, or attitudes of our company. When using this model, users should independently judge and evaluate the rationality and applicability of the output content and comply with relevant laws, regulations, and social ethics.
-For more detailed information, please refer to our [blog]().
 ## Quickstart
 The environmental requirements for running it are exactly the same as those of the [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) model. Therefore, you can easily use Transformers or vLLM to load and run the model for inference, and deploy your services.

 ## Introduction
 **Confucius-o1-14B** is a o1-like reasoning model developed by the NetEase Youdao Team, it can be easily deployed on a single GPU without quantization. This model is based on the Qwen2.5-14B-Instruct model and adopts a two-stage learning strategy, enabling the lightweight 14B model to possess thinking abilities similar to those of o1. What sets it apart is that after generating the chain of thought, it can summarize a step-by-step problem-solving process from the chain of thought on its own. This can prevent users from getting bogged down in the complex chain of thought and allows them to easily obtain the correct problem-solving ideas and answers.
+## Optimization Methods
+**Selection of the Base Model**: Our open-source model is based on the Qwen2.5-14B-Instruct model. We chose this model as the starting point for optimization because it can be deployed on a single GPU and has powerful basic capabilities. Our internal experiments show that starting from a base model with stronger mathematical capabilities and going through the same optimization process can result in an o1-like model with stronger reasoning capabilities.
+**Two-stage Learning**: Our optimization process is divided into two stages. In the first stage, the model learns from a larger o1-like teacher model. This is the most effective way to enable a small model to efficiently master the o1 thinking pattern. In the second stage, the model conducts self-iterative learning to further enhance its reasoning ability. On our internal evaluation dataset, these two stages bring about a performance improvement of approximately 10 points and 6 points respectively.
+**Data Formatting**: Different from general o1-like models, our model is designed for applications in the education field. Therefore, we expect the model not only to output the final answer but also to provide a step-by-step problem-solving process based on the correct thinking process in the chain of thought. To this end, we standardize the output format of the model as follows: the chain-of-thought process is output in the <thinking></thinking> block, and then the step-by-step problem-solving process is summarized in the <summary></summary> block.
+**More Stringent Data filtering**: To ensure the quality of the learning data, we not only evaluate the correctness of the final answer but also examine the accuracy of the explanation process in the entire summary. This is achieved through an automated evaluation methods developed internally, which can effectively prevent the model from learning false positives.
+**Selection of Training Instructions**: The training instruction data we used was sampled from an internal training dataset, with 6,000 samples mainly covering non-graphic math problems in the K12 scenario. It has no intersection with the training data of the benchmark test set. We made such a data selection because our optimization is mainly for applications in the education field.
+## Evaluation and Results
+![alt text](image.png)
+> Note: The results marked with * are directly obtained from the data provided by the respective model/interface provider, and the other results are from our evaluation.
+## Limitations
 However, there are some limitations that must be stated in advance:
 1. **Scenario Limitations**: Our optimization is only carried out on data from the K12 mathematics scenario, and the effectiveness has only been verified in math-related benchmark tests. The performance of the model in non-mathematical scenarios has not been tested, so we cannot guarantee its quality and effectiveness in other fields.
 2. **Language-related Issues**: In the “summary” block, the model has a stronger tendency to generate Chinese content. In the “thinking” block, the model may reason in an unexpected language environment or even present a mixture of languages. However, this does not affect the actual reasoning ability of the model. This indicates that the chain of thought itself may not have independent value, it is merely an easier-to-learn path leading to a correct summary.
 3. **Invalid Results**: The model may sometimes fall into circular reasoning. Since we use explicit identifiers to divide the thinking and summary parts, when the model enters this mode, it may generate invalid results that cannot be parsed.
 4. **Safety and Ethics**: This model has not undergone optimization and testing for alignment at the safety and ethical levels. Any output generated by the model does not represent the official positions, views, or attitudes of our company. When using this model, users should independently judge and evaluate the rationality and applicability of the output content and comply with relevant laws, regulations, and social ethics.
 ## Quickstart
 The environmental requirements for running it are exactly the same as those of the [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) model. Therefore, you can easily use Transformers or vLLM to load and run the model for inference, and deploy your services.

image.png ADDED Viewed