GitBag commited on
Commit
2bfcbb6
1 Parent(s): 704482d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -27
README.md CHANGED
@@ -4,13 +4,14 @@ datasets:
4
  - berkeley-nest/Nectar
5
  language:
6
  - en
 
7
  ---
8
  This is a model released for our paper: [REBEL: Reinforcement Learning via Regressing Relative Rewards](https://arxiv.org/abs/2404.16767).
9
 
10
- # REBEL-Llama-3
11
 
12
  This model is developed with REBEL based on [OpenChat-3.5](https://huggingface.co/openchat/openchat_3.5) with [Starling-RM-7B-alpha](https://huggingface.co/berkeley-nest/Starling-RM-7B-alpha) as the reward model and [Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar) dataset.
13
- The training code is available at https://github.com/ZhaolinGao/REBEL.
14
 
15
  ### Links to Other Model
16
 
@@ -18,27 +19,22 @@ The training code is available at https://github.com/ZhaolinGao/REBEL.
18
 
19
  [REBEL-Llama-3-epoch_2](https://huggingface.co/Cornell-AGI/REBEL-Llama-3-epoch_2)
20
 
21
- ### AlpacaEval 2.0 Evaluations
22
 
23
- | Model | AlpacaEval 2.0<br>LC Win Rate | AlpacaEval 2.0<br>Win Rate |
24
- | :--------: | :--------: | :--------: |
25
- | REBEL-OpenChat-3.5| 17.3 | 12.8 |
26
- | REBEL-Llama-3 | 30.1 | 32.6 |
27
- | REBEL-Llama-3-epoch_2| 31.33 | 34.22 |
28
 
29
- ### MT-Bench Evaluations
30
 
31
- | Model | MT-Bench<br>1st Turn | MT-Bench<br>2nd Turn | MT-Bench<br>Average |
32
- | :--------: | :--------: | :--------: | :--------: |
33
- | REBEL-OpenChat-3.5 | 8.54 | 7.58 | 8.06 |
34
- | REBEL-Llama-3 | 8.63 | 7.69 | 8.16 |
35
 
36
- ### Open LLM Leaderboard Evaluations
37
-
38
- | Model | MMLU<br>(5-shot) | GSM8K<br>(5-shot) | Arc<br>(25-shot) | Winogrande<br>(5-shot) | TruthfulQA<br>(0-shot) | HellaSway<br>(10-shot) | Average
39
- | :--------: | :--------: | :--------: | :--------: | :--------: | :--------: | :--------: | :--------: |
40
- | REBEL-OpenChat-3.5 | 63.7 | 68.8 | 64.3 | 80.4 | 48.2 | 85.0 | 68.4 |
41
- | REBEL-Llama-3 | 65.8 | 75.6 | 61.7 | 75.8 | 51.7 | 78.8 | 68.2 |
 
 
42
 
43
  ## Citation
44
  Please cite our paper if you use this model in your own work:
@@ -51,11 +47,4 @@ Please cite our paper if you use this model in your own work:
51
  archivePrefix={arXiv},
52
  primaryClass={cs.LG}
53
  }
54
- ```
55
-
56
-
57
-
58
-
59
-
60
-
61
-
 
4
  - berkeley-nest/Nectar
5
  language:
6
  - en
7
+ base_model: openchat/openchat_3.5
8
  ---
9
  This is a model released for our paper: [REBEL: Reinforcement Learning via Regressing Relative Rewards](https://arxiv.org/abs/2404.16767).
10
 
11
+ # REBEL-OpenChat-3.5
12
 
13
  This model is developed with REBEL based on [OpenChat-3.5](https://huggingface.co/openchat/openchat_3.5) with [Starling-RM-7B-alpha](https://huggingface.co/berkeley-nest/Starling-RM-7B-alpha) as the reward model and [Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar) dataset.
14
+ The training code is available at https://github.com/ZhaolinGao/REBEL. We collect online generations during each iteration with a batch size of 32.
15
 
16
  ### Links to Other Model
17
 
 
19
 
20
  [REBEL-Llama-3-epoch_2](https://huggingface.co/Cornell-AGI/REBEL-Llama-3-epoch_2)
21
 
22
+ [REBEL-Llama-3-Armo-iter_1](https://huggingface.co/Cornell-AGI/REBEL-Llama-3-Armo-iter_1)
23
 
24
+ [REBEL-Llama-3-Armo-iter_2](https://huggingface.co/Cornell-AGI/REBEL-Llama-3-Armo-iter_2)
 
 
 
 
25
 
26
+ [REBEL-Llama-3-Armo-iter_3](https://huggingface.co/Cornell-AGI/REBEL-Llama-3-Armo-iter_3)
27
 
28
+ ### Evaluations
 
 
 
29
 
30
+ | Model | AlpacaEval 2.0<br>LC Win Rate | AlpacaEval 2.0<br>Win Rate | MT-Bench<br>Average | MMLU<br>(5-shot) | GSM8K<br>(5-shot) |
31
+ | :--------: | :--------: | :--------: | :--------: | :--------: | :--------: |
32
+ | REBEL-OpenChat-3.5| 17.3 | 12.8 | 8.06 | 63.7 | 68.8 |
33
+ | REBEL-Llama-3 | 30.1 | 32.6 | 8.16 | 65.8 | 75.6 |
34
+ | REBEL-Llama-3-epoch_2| 31.3 | 34.2 | 7.83 | 65.4 | 75.4 |
35
+ | REBEL-Llama-3-Armo-iter_1| 48.3 | 41.8 | 8.13 | 66.3 | 75.8 |
36
+ | REBEL-Llama-3-Armo-iter_2| 50.0 | 48.5 | 8.07 | 65.9 | 75.4 |
37
+ | REBEL-Llama-3-Armo-iter_3| 49.7 | 48.1 | 8.01 | 66.0 | 75.7 |
38
 
39
  ## Citation
40
  Please cite our paper if you use this model in your own work:
 
47
  archivePrefix={arXiv},
48
  primaryClass={cs.LG}
49
  }
50
+ ```