Update README.md
Browse files
README.md
CHANGED
@@ -11,11 +11,11 @@ tags:
|
|
11 |
# gemma-2-27b-it-SimPO-37K Model Card
|
12 |
|
13 |
## Implementation Details
|
14 |
-
We first followed the [SimPO](https://github.com/princeton-nlp/SimPO) framework to apply [On-Policy Preference Data Generation](https://github.com/princeton-nlp/SimPO/tree/main/on_policy_data_gen) on the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset using the [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) model. We then selected prompts where the chosen reward was at least 0.01 higher than the rejected reward, resulting in 37,040 training data points.
|
15 |
|
16 |
-
Model training was conducted using 8x80G A800 GPUs, leveraging the [alignment-handbook](https://github.com/huggingface/alignment-handbook) library. We used `deepspeed_zero_stage3` with optimizer offloading to the CPU. The
|
17 |
|
18 |
-
```
|
19 |
# SimPOTrainer arguments
|
20 |
bf16: true
|
21 |
beta: 10
|
@@ -45,6 +45,34 @@ warmup_ratio: 0.1
|
|
45 |
save_only_model: true
|
46 |
```
|
47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
## Citation
|
49 |
|
50 |
gemma model:
|
|
|
11 |
# gemma-2-27b-it-SimPO-37K Model Card
|
12 |
|
13 |
## Implementation Details
|
14 |
+
We first followed the [SimPO](https://github.com/princeton-nlp/SimPO) framework to apply [On-Policy Preference Data Generation](https://github.com/princeton-nlp/SimPO/tree/main/on_policy_data_gen) on the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset using the [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) model, using [RLHFlow/ArmoRM-Llama3-8B-v0.1](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1) as reward model to annotate responses. We then selected prompts where the chosen reward was at least 0.01 higher than the rejected reward, resulting in 37,040 training data points.
|
15 |
|
16 |
+
Model training was conducted using 8x80G A800 GPUs, leveraging the [SimPO](https://github.com/princeton-nlp/SimPO) and [alignment-handbook](https://github.com/huggingface/alignment-handbook) library. We used `deepspeed_zero_stage3` with optimizer offloading to the CPU. The training configs were as follows:
|
17 |
|
18 |
+
```yaml
|
19 |
# SimPOTrainer arguments
|
20 |
bf16: true
|
21 |
beta: 10
|
|
|
45 |
save_only_model: true
|
46 |
```
|
47 |
|
48 |
+
```yaml
|
49 |
+
# deepspeed_zero3_offload_optimizer.yaml
|
50 |
+
compute_environment: LOCAL_MACHINE
|
51 |
+
debug: false
|
52 |
+
deepspeed_config:
|
53 |
+
deepspeed_multinode_launcher: standard
|
54 |
+
offload_optimizer_device: cpu
|
55 |
+
offload_param_device: none
|
56 |
+
zero3_init_flag: true
|
57 |
+
zero3_save_16bit_model: true
|
58 |
+
zero_stage: 3
|
59 |
+
distributed_type: DEEPSPEED
|
60 |
+
downcast_bf16: 'no'
|
61 |
+
machine_rank: 0
|
62 |
+
main_training_function: main
|
63 |
+
main_process_port: 2390
|
64 |
+
mixed_precision: bf16
|
65 |
+
num_machines: 1
|
66 |
+
num_processes: 8
|
67 |
+
rdzv_backend: static
|
68 |
+
same_network: true
|
69 |
+
tpu_env: []
|
70 |
+
tpu_use_cluster: false
|
71 |
+
tpu_use_sudo: false
|
72 |
+
use_cpu: false
|
73 |
+
```
|
74 |
+
|
75 |
+
|
76 |
## Citation
|
77 |
|
78 |
gemma model:
|