Elliott commited on
Commit
d1611e6
·
verified ·
1 Parent(s): 9d3e41a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -15
README.md CHANGED
@@ -1,13 +1,13 @@
1
- ---
2
- library_name: transformers
3
- tags:
4
- - reasoning
5
- - Zero-RL
6
- license: mit
7
- base_model:
8
- - Qwen/Qwen2.5-Math-7B
9
- pipeline_tag: text-generation
10
- ---
11
  # 📖Introduction
12
 
13
  ![Github](https://img.shields.io/badge/LUFFY-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)
@@ -60,8 +60,6 @@ LUFFY is evaluated on six competition-level benchmarks, achieving state-of-the-a
60
  | OpenReasoner-Zero | 17.2 | 15.0 | 52.3 | 84.6 | 33.8 | 47.1 | 41.7 |
61
  | PRIME-Zero | 17.9 | 14.7 | 55.2 | 79.4 | **38.2** | 42.2 | 41.3 |
62
  | Oat-Zero | **31.7** | 11.0 | 61.6 | 79.2 | 29.8 | 42.5 | 42.6 |
63
- | SFT (Our repication) | 28.6 | **23.5** | 59.0 | 86.0 | 37.5 | 51.1 | 47.6 |
64
- | On-Policy RL (Our repication) | 24.6 | 15.7 | 61.3 | 84.6 | 34.9 | 47.9 | 44.8 |
65
  | **LUFFY** | 29.5 | 23.2 | **66.1**| **88.4** | 33.8 | **56.4** | **49.6** |
66
 
67
  ---
@@ -77,10 +75,8 @@ LUFFY also generalizes well to out-of-distribution tasks, with over +6.2 average
77
  | Qwen2.5-Math-7B-Instruct | 70.3 | 24.7 | 34.1 | 43.0 |
78
  | SimpleRL-Zero | 30.2 | 23.2 | 34.5 | 29.3 |
79
  | OpenReasoner-Zero | 66.2 | 29.8 | 58.7 | 51.6 |
80
- | PRIME-Zero | **73.3** | 18.2 | 32.7 | 41.4 |
81
  | Oat-Zero | 70.1 | 23.7 | 41.7 | 45.2 |
82
- | SFT (Our repication) | 75.2 | 24.7 | 42.7 | 47.5 |
83
- | On-Policy RL (Our repication) | **82.3** | **40.4** | _49.3_ | _57.3_ |
84
  | **LUFFY** | _80.5_ | _39.9_ | **53.0** | **57.8** |
85
 
86
  ---
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - reasoning
5
+ - Zero-RL
6
+ license: mit
7
+ base_model:
8
+ - Qwen/Qwen2.5-Math-7B
9
+ pipeline_tag: text-generation
10
+ ---
11
  # 📖Introduction
12
 
13
  ![Github](https://img.shields.io/badge/LUFFY-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)
 
60
  | OpenReasoner-Zero | 17.2 | 15.0 | 52.3 | 84.6 | 33.8 | 47.1 | 41.7 |
61
  | PRIME-Zero | 17.9 | 14.7 | 55.2 | 79.4 | **38.2** | 42.2 | 41.3 |
62
  | Oat-Zero | **31.7** | 11.0 | 61.6 | 79.2 | 29.8 | 42.5 | 42.6 |
 
 
63
  | **LUFFY** | 29.5 | 23.2 | **66.1**| **88.4** | 33.8 | **56.4** | **49.6** |
64
 
65
  ---
 
75
  | Qwen2.5-Math-7B-Instruct | 70.3 | 24.7 | 34.1 | 43.0 |
76
  | SimpleRL-Zero | 30.2 | 23.2 | 34.5 | 29.3 |
77
  | OpenReasoner-Zero | 66.2 | 29.8 | 58.7 | 51.6 |
78
+ | PRIME-Zero | 73.3 | 18.2 | 32.7 | 41.4 |
79
  | Oat-Zero | 70.1 | 23.7 | 41.7 | 45.2 |
 
 
80
  | **LUFFY** | _80.5_ | _39.9_ | **53.0** | **57.8** |
81
 
82
  ---