Elliott nielsr HF Staff commited on
Commit
2227833
·
verified ·
1 Parent(s): d1611e6

Correct pipeline tag and add Github link (#1)

Browse files

- Correct pipeline tag and add Github link (e97f882612575df02f1e77be308cecccee308a76)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +8 -5
README.md CHANGED
@@ -1,13 +1,14 @@
1
  ---
 
 
2
  library_name: transformers
 
 
3
  tags:
4
  - reasoning
5
  - Zero-RL
6
- license: mit
7
- base_model:
8
- - Qwen/Qwen2.5-Math-7B
9
- pipeline_tag: text-generation
10
  ---
 
11
  # 📖Introduction
12
 
13
  ![Github](https://img.shields.io/badge/LUFFY-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)
@@ -74,9 +75,9 @@ LUFFY also generalizes well to out-of-distribution tasks, with over +6.2 average
74
  | Qwen2.5-Math-7B-Base | 18.2 | 11.1 | 16.9 | 15.4 |
75
  | Qwen2.5-Math-7B-Instruct | 70.3 | 24.7 | 34.1 | 43.0 |
76
  | SimpleRL-Zero | 30.2 | 23.2 | 34.5 | 29.3 |
77
- | OpenReasoner-Zero | 66.2 | 29.8 | 58.7 | 51.6 |
78
  | PRIME-Zero | 73.3 | 18.2 | 32.7 | 41.4 |
79
  | Oat-Zero | 70.1 | 23.7 | 41.7 | 45.2 |
 
80
  | **LUFFY** | _80.5_ | _39.9_ | **53.0** | **57.8** |
81
 
82
  ---
@@ -85,6 +86,8 @@ LUFFY also generalizes well to out-of-distribution tasks, with over +6.2 average
85
 
86
  LUFFY builds upon [veRL](https://github.com/volcengine/verl) and [deepscaler](https://github.com/agentica-project/rllm), and utilizes [vLLM](https://github.com/vllm-project/vllm) for inference. We utilize [Math-Verify](https://github.com/huggingface/Math-Verify) for math reasoning evaluation. We thank the open-source community for datasets and backbones, including [NuminaMath](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT), [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k), [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math), and [DeepSeek-R1](https://github.com/deepseek-ai/deepseek-r1) model.
87
 
 
 
88
  # Citation
89
  If you find our model, data, or evaluation code useful, please kindly cite our paper:
90
  ```bib
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-Math-7B
4
  library_name: transformers
5
+ license: mit
6
+ pipeline_tag: text-generation
7
  tags:
8
  - reasoning
9
  - Zero-RL
 
 
 
 
10
  ---
11
+
12
  # 📖Introduction
13
 
14
  ![Github](https://img.shields.io/badge/LUFFY-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)
 
75
  | Qwen2.5-Math-7B-Base | 18.2 | 11.1 | 16.9 | 15.4 |
76
  | Qwen2.5-Math-7B-Instruct | 70.3 | 24.7 | 34.1 | 43.0 |
77
  | SimpleRL-Zero | 30.2 | 23.2 | 34.5 | 29.3 |
 
78
  | PRIME-Zero | 73.3 | 18.2 | 32.7 | 41.4 |
79
  | Oat-Zero | 70.1 | 23.7 | 41.7 | 45.2 |
80
+ | OpenReasoner-Zero | 66.2 | 29.8 | 58.7 | 51.6 |
81
  | **LUFFY** | _80.5_ | _39.9_ | **53.0** | **57.8** |
82
 
83
  ---
 
86
 
87
  LUFFY builds upon [veRL](https://github.com/volcengine/verl) and [deepscaler](https://github.com/agentica-project/rllm), and utilizes [vLLM](https://github.com/vllm-project/vllm) for inference. We utilize [Math-Verify](https://github.com/huggingface/Math-Verify) for math reasoning evaluation. We thank the open-source community for datasets and backbones, including [NuminaMath](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT), [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k), [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math), and [DeepSeek-R1](https://github.com/deepseek-ai/deepseek-r1) model.
88
 
89
+ Code: https://github.com/ElliottYan/LUFFY
90
+
91
  # Citation
92
  If you find our model, data, or evaluation code useful, please kindly cite our paper:
93
  ```bib