luzimu commited on
Commit
39d5582
1 Parent(s): a1a8d9b

modify readme

Browse files
Files changed (2) hide show
  1. README.md +15 -14
  2. eval.png +0 -0
README.md CHANGED
@@ -1,22 +1,23 @@
1
  ---
2
- base_model: /mnt/cache/luzimu/rlhf_math/alignment-handbook/outs/Mistral-7B-v0.1-lce
3
  tags:
4
- - alignment-handbook
5
- - generated_from_trainer
6
- datasets:
7
- - /mnt/cache/luzimu/rlhf_math/data/controled_steps_math_gsm8k_lce_dpo_ascend_lim2_lim3_add_dpo1x1
8
  model-index:
9
- - name: Mistral-7B-v0.1-lce_controled_steps_dpo_ascend_lim2_lim3_add_dpo1x1
10
  results: []
 
 
 
 
 
 
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
 
16
- # Mistral-7B-v0.1-lce_controled_steps_dpo_ascend_lim2_lim3_add_dpo1x1
17
-
18
- This model is a fine-tuned version of [/mnt/cache/luzimu/rlhf_math/alignment-handbook/outs/Mistral-7B-v0.1-lce](https://huggingface.co//mnt/cache/luzimu/rlhf_math/alignment-handbook/outs/Mistral-7B-v0.1-lce) on the /mnt/cache/luzimu/rlhf_math/data/controled_steps_math_gsm8k_lce_dpo_ascend_lim2_lim3_add_dpo1x1 dataset.
19
  It achieves the following results on the evaluation set:
 
20
  - Loss: 0.1793
21
  - Rewards/chosen: 0.2587
22
  - Rewards/rejected: -7.0301
@@ -29,15 +30,15 @@ It achieves the following results on the evaluation set:
29
 
30
  ## Model description
31
 
32
- More information needed
33
 
34
  ## Intended uses & limitations
35
 
36
- More information needed
37
 
38
  ## Training and evaluation data
39
 
40
- More information needed
41
 
42
  ## Training procedure
43
 
 
1
  ---
2
+ base_model: MathGenie/Mistral-7B-Ours-SFT
3
  tags:
4
+ - math
 
 
 
5
  model-index:
6
+ - name: Mistral-7B-Ours-SFT-SCDPO
7
  results: []
8
+ license: apache-2.0
9
+ language:
10
+ - en
11
+ metrics:
12
+ - accuracy
13
+ pipeline_tag: text-generation
14
  ---
15
 
16
+ # Mistral-7B-Ours-SFT-SCDPO
 
17
 
18
+ This model is a fine-tuned version of MathGenie/Mistral-7B-Ours-SFT.
 
 
19
  It achieves the following results on the evaluation set:
20
+
21
  - Loss: 0.1793
22
  - Rewards/chosen: 0.2587
23
  - Rewards/rejected: -7.0301
 
30
 
31
  ## Model description
32
 
33
+ This is a model fine-tuned for mathematical problem-solving.
34
 
35
  ## Intended uses & limitations
36
 
37
+ The model is intended for solving math problems.
38
 
39
  ## Training and evaluation data
40
 
41
+ ![eval](../Mistral-7B-Ours-SFT/eval.png)
42
 
43
  ## Training procedure
44
 
eval.png ADDED