Jan Majkutewicz
commited on
Commit
•
d601388
1
Parent(s):
08b1d78
Model save
Browse files- README.md +109 -0
- adapter_model.safetensors +1 -1
- all_results.json +9 -0
- train_results.json +9 -0
- trainer_state.json +0 -0
README.md
ADDED
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
library_name: peft
|
4 |
+
tags:
|
5 |
+
- trl
|
6 |
+
- dpo
|
7 |
+
- generated_from_trainer
|
8 |
+
base_model: alignment-handbook/zephyr-7b-sft-full
|
9 |
+
model-index:
|
10 |
+
- name: zephyr-7b-dpo-lora
|
11 |
+
results: []
|
12 |
+
---
|
13 |
+
|
14 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
+
should probably proofread and complete it, then remove this comment. -->
|
16 |
+
|
17 |
+
# zephyr-7b-dpo-lora
|
18 |
+
|
19 |
+
This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
|
20 |
+
It achieves the following results on the evaluation set:
|
21 |
+
- Loss: 0.5893
|
22 |
+
- Rewards/chosen: -0.2740
|
23 |
+
- Rewards/rejected: -0.6023
|
24 |
+
- Rewards/accuracies: 0.7025
|
25 |
+
- Rewards/margins: 0.3283
|
26 |
+
- Logps/rejected: -321.6666
|
27 |
+
- Logps/chosen: -310.1333
|
28 |
+
- Logits/rejected: -2.7525
|
29 |
+
- Logits/chosen: -2.7742
|
30 |
+
|
31 |
+
## Model description
|
32 |
+
|
33 |
+
More information needed
|
34 |
+
|
35 |
+
## Intended uses & limitations
|
36 |
+
|
37 |
+
More information needed
|
38 |
+
|
39 |
+
## Training and evaluation data
|
40 |
+
|
41 |
+
More information needed
|
42 |
+
|
43 |
+
## Training procedure
|
44 |
+
|
45 |
+
### Training hyperparameters
|
46 |
+
|
47 |
+
The following hyperparameters were used during training:
|
48 |
+
- learning_rate: 5e-07
|
49 |
+
- train_batch_size: 8
|
50 |
+
- eval_batch_size: 8
|
51 |
+
- seed: 42
|
52 |
+
- gradient_accumulation_steps: 2
|
53 |
+
- total_train_batch_size: 16
|
54 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
55 |
+
- lr_scheduler_type: cosine
|
56 |
+
- lr_scheduler_warmup_ratio: 0.1
|
57 |
+
- num_epochs: 1
|
58 |
+
|
59 |
+
### Training results
|
60 |
+
|
61 |
+
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
62 |
+
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
63 |
+
| 0.6929 | 0.0262 | 100 | 0.6930 | -0.0001 | -0.0004 | 0.5250 | 0.0003 | -261.4788 | -282.7496 | -2.8388 | -2.8661 |
|
64 |
+
| 0.6923 | 0.0523 | 200 | 0.6923 | 0.0008 | -0.0009 | 0.6050 | 0.0017 | -261.5316 | -282.6624 | -2.8380 | -2.8653 |
|
65 |
+
| 0.6898 | 0.0785 | 300 | 0.6903 | 0.0035 | -0.0024 | 0.6640 | 0.0058 | -261.6760 | -282.3918 | -2.8350 | -2.8623 |
|
66 |
+
| 0.6872 | 0.1047 | 400 | 0.6862 | 0.0165 | 0.0021 | 0.6670 | 0.0144 | -261.2256 | -281.0900 | -2.8308 | -2.8577 |
|
67 |
+
| 0.6783 | 0.1309 | 500 | 0.6804 | 0.0209 | -0.0059 | 0.6835 | 0.0267 | -262.0230 | -280.6481 | -2.8215 | -2.8486 |
|
68 |
+
| 0.6729 | 0.1570 | 600 | 0.6733 | 0.0154 | -0.0272 | 0.6840 | 0.0426 | -264.1608 | -281.1958 | -2.8138 | -2.8410 |
|
69 |
+
| 0.6665 | 0.1832 | 700 | 0.6638 | -0.0035 | -0.0689 | 0.6755 | 0.0654 | -268.3266 | -283.0863 | -2.8060 | -2.8327 |
|
70 |
+
| 0.6427 | 0.2094 | 800 | 0.6546 | -0.0214 | -0.1104 | 0.6815 | 0.0889 | -272.4747 | -284.8825 | -2.8020 | -2.8283 |
|
71 |
+
| 0.6428 | 0.2355 | 900 | 0.6458 | -0.0247 | -0.1383 | 0.6770 | 0.1136 | -275.2685 | -285.2050 | -2.7942 | -2.8199 |
|
72 |
+
| 0.6381 | 0.2617 | 1000 | 0.6358 | -0.0638 | -0.2074 | 0.6785 | 0.1436 | -282.1761 | -289.1206 | -2.7887 | -2.8138 |
|
73 |
+
| 0.6488 | 0.2879 | 1100 | 0.6284 | -0.1378 | -0.3055 | 0.6790 | 0.1677 | -291.9890 | -296.5138 | -2.7826 | -2.8071 |
|
74 |
+
| 0.6427 | 0.3141 | 1200 | 0.6223 | -0.1104 | -0.2986 | 0.6835 | 0.1882 | -291.3028 | -293.7785 | -2.7931 | -2.8165 |
|
75 |
+
| 0.6131 | 0.3402 | 1300 | 0.6172 | -0.1466 | -0.3514 | 0.6865 | 0.2049 | -296.5806 | -297.3945 | -2.7951 | -2.8180 |
|
76 |
+
| 0.6326 | 0.3664 | 1400 | 0.6155 | -0.1752 | -0.3896 | 0.6860 | 0.2144 | -300.3966 | -300.2597 | -2.7920 | -2.8147 |
|
77 |
+
| 0.6128 | 0.3926 | 1500 | 0.6180 | -0.0630 | -0.2687 | 0.6890 | 0.2057 | -288.3090 | -289.0369 | -2.7980 | -2.8198 |
|
78 |
+
| 0.6223 | 0.4187 | 1600 | 0.6088 | -0.1688 | -0.4097 | 0.6945 | 0.2409 | -302.4074 | -299.6220 | -2.7926 | -2.8148 |
|
79 |
+
| 0.6338 | 0.4449 | 1700 | 0.6061 | -0.2152 | -0.4665 | 0.6925 | 0.2513 | -308.0869 | -304.2535 | -2.7961 | -2.8181 |
|
80 |
+
| 0.585 | 0.4711 | 1800 | 0.6050 | -0.1327 | -0.3850 | 0.6915 | 0.2523 | -299.9368 | -296.0054 | -2.7949 | -2.8174 |
|
81 |
+
| 0.577 | 0.4973 | 1900 | 0.6013 | -0.2170 | -0.4883 | 0.6965 | 0.2713 | -310.2670 | -304.4333 | -2.7954 | -2.8176 |
|
82 |
+
| 0.5945 | 0.5234 | 2000 | 0.5992 | -0.2107 | -0.4899 | 0.6995 | 0.2793 | -310.4293 | -303.8028 | -2.7903 | -2.8122 |
|
83 |
+
| 0.5913 | 0.5496 | 2100 | 0.5981 | -0.2373 | -0.5251 | 0.7025 | 0.2879 | -313.9529 | -306.4641 | -2.7863 | -2.8085 |
|
84 |
+
| 0.5816 | 0.5758 | 2200 | 0.5989 | -0.2688 | -0.5570 | 0.6970 | 0.2883 | -317.1411 | -309.6146 | -2.7849 | -2.8070 |
|
85 |
+
| 0.5824 | 0.6019 | 2300 | 0.5961 | -0.2227 | -0.5189 | 0.6955 | 0.2961 | -313.3233 | -305.0098 | -2.7821 | -2.8037 |
|
86 |
+
| 0.602 | 0.6281 | 2400 | 0.5969 | -0.2683 | -0.5669 | 0.6990 | 0.2986 | -318.1251 | -309.5652 | -2.7744 | -2.7961 |
|
87 |
+
| 0.5792 | 0.6543 | 2500 | 0.5963 | -0.2102 | -0.5041 | 0.6975 | 0.2938 | -311.8429 | -303.7615 | -2.7763 | -2.7980 |
|
88 |
+
| 0.6028 | 0.6805 | 2600 | 0.5974 | -0.1896 | -0.4790 | 0.6920 | 0.2895 | -309.3417 | -301.6964 | -2.7717 | -2.7933 |
|
89 |
+
| 0.5854 | 0.7066 | 2700 | 0.5930 | -0.2517 | -0.5615 | 0.7020 | 0.3098 | -317.5864 | -307.9027 | -2.7676 | -2.7892 |
|
90 |
+
| 0.5994 | 0.7328 | 2800 | 0.5920 | -0.2607 | -0.5775 | 0.7045 | 0.3167 | -319.1838 | -308.8107 | -2.7636 | -2.7851 |
|
91 |
+
| 0.5837 | 0.7590 | 2900 | 0.5913 | -0.2540 | -0.5721 | 0.7055 | 0.3181 | -318.6511 | -308.1379 | -2.7619 | -2.7834 |
|
92 |
+
| 0.5858 | 0.7851 | 3000 | 0.5910 | -0.2625 | -0.5835 | 0.7055 | 0.3210 | -319.7853 | -308.9898 | -2.7605 | -2.7819 |
|
93 |
+
| 0.5685 | 0.8113 | 3100 | 0.5914 | -0.2383 | -0.5571 | 0.7040 | 0.3188 | -317.1507 | -306.5707 | -2.7558 | -2.7777 |
|
94 |
+
| 0.5753 | 0.8375 | 3200 | 0.5903 | -0.2623 | -0.5868 | 0.7020 | 0.3246 | -320.1224 | -308.9666 | -2.7567 | -2.7783 |
|
95 |
+
| 0.5769 | 0.8636 | 3300 | 0.5900 | -0.2673 | -0.5934 | 0.7030 | 0.3260 | -320.7757 | -309.4716 | -2.7555 | -2.7771 |
|
96 |
+
| 0.5608 | 0.8898 | 3400 | 0.5896 | -0.2716 | -0.5988 | 0.7020 | 0.3273 | -321.3196 | -309.8930 | -2.7520 | -2.7739 |
|
97 |
+
| 0.6008 | 0.9160 | 3500 | 0.5895 | -0.2716 | -0.5994 | 0.7035 | 0.3277 | -321.3745 | -309.9000 | -2.7539 | -2.7755 |
|
98 |
+
| 0.585 | 0.9422 | 3600 | 0.5895 | -0.2722 | -0.6000 | 0.7020 | 0.3279 | -321.4418 | -309.9531 | -2.7549 | -2.7764 |
|
99 |
+
| 0.567 | 0.9683 | 3700 | 0.5893 | -0.2738 | -0.6022 | 0.7015 | 0.3284 | -321.6555 | -310.1171 | -2.7539 | -2.7755 |
|
100 |
+
| 0.5834 | 0.9945 | 3800 | 0.5893 | -0.2740 | -0.6023 | 0.7025 | 0.3283 | -321.6666 | -310.1333 | -2.7525 | -2.7742 |
|
101 |
+
|
102 |
+
|
103 |
+
### Framework versions
|
104 |
+
|
105 |
+
- PEFT 0.10.0
|
106 |
+
- Transformers 4.40.0
|
107 |
+
- Pytorch 2.2.0
|
108 |
+
- Datasets 2.16.1
|
109 |
+
- Tokenizers 0.19.1
|
adapter_model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1342238560
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7824785b77388bdacbd438b6940d2e36888c73f044b90f65f3e52ea1d3c98100
|
3 |
size 1342238560
|
all_results.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 1.0,
|
3 |
+
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.6164219083351729,
|
5 |
+
"train_runtime": 73481.1174,
|
6 |
+
"train_samples": 61134,
|
7 |
+
"train_samples_per_second": 0.832,
|
8 |
+
"train_steps_per_second": 0.052
|
9 |
+
}
|
train_results.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 1.0,
|
3 |
+
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.6164219083351729,
|
5 |
+
"train_runtime": 73481.1174,
|
6 |
+
"train_samples": 61134,
|
7 |
+
"train_samples_per_second": 0.832,
|
8 |
+
"train_steps_per_second": 0.052
|
9 |
+
}
|
trainer_state.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|