Jan Majkutewicz commited on
Commit
d601388
1 Parent(s): 08b1d78

Model save

Browse files
README.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ base_model: alignment-handbook/zephyr-7b-sft-full
9
+ model-index:
10
+ - name: zephyr-7b-dpo-lora
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-7b-dpo-lora
18
+
19
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.5893
22
+ - Rewards/chosen: -0.2740
23
+ - Rewards/rejected: -0.6023
24
+ - Rewards/accuracies: 0.7025
25
+ - Rewards/margins: 0.3283
26
+ - Logps/rejected: -321.6666
27
+ - Logps/chosen: -310.1333
28
+ - Logits/rejected: -2.7525
29
+ - Logits/chosen: -2.7742
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5e-07
49
+ - train_batch_size: 8
50
+ - eval_batch_size: 8
51
+ - seed: 42
52
+ - gradient_accumulation_steps: 2
53
+ - total_train_batch_size: 16
54
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: cosine
56
+ - lr_scheduler_warmup_ratio: 0.1
57
+ - num_epochs: 1
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.6929 | 0.0262 | 100 | 0.6930 | -0.0001 | -0.0004 | 0.5250 | 0.0003 | -261.4788 | -282.7496 | -2.8388 | -2.8661 |
64
+ | 0.6923 | 0.0523 | 200 | 0.6923 | 0.0008 | -0.0009 | 0.6050 | 0.0017 | -261.5316 | -282.6624 | -2.8380 | -2.8653 |
65
+ | 0.6898 | 0.0785 | 300 | 0.6903 | 0.0035 | -0.0024 | 0.6640 | 0.0058 | -261.6760 | -282.3918 | -2.8350 | -2.8623 |
66
+ | 0.6872 | 0.1047 | 400 | 0.6862 | 0.0165 | 0.0021 | 0.6670 | 0.0144 | -261.2256 | -281.0900 | -2.8308 | -2.8577 |
67
+ | 0.6783 | 0.1309 | 500 | 0.6804 | 0.0209 | -0.0059 | 0.6835 | 0.0267 | -262.0230 | -280.6481 | -2.8215 | -2.8486 |
68
+ | 0.6729 | 0.1570 | 600 | 0.6733 | 0.0154 | -0.0272 | 0.6840 | 0.0426 | -264.1608 | -281.1958 | -2.8138 | -2.8410 |
69
+ | 0.6665 | 0.1832 | 700 | 0.6638 | -0.0035 | -0.0689 | 0.6755 | 0.0654 | -268.3266 | -283.0863 | -2.8060 | -2.8327 |
70
+ | 0.6427 | 0.2094 | 800 | 0.6546 | -0.0214 | -0.1104 | 0.6815 | 0.0889 | -272.4747 | -284.8825 | -2.8020 | -2.8283 |
71
+ | 0.6428 | 0.2355 | 900 | 0.6458 | -0.0247 | -0.1383 | 0.6770 | 0.1136 | -275.2685 | -285.2050 | -2.7942 | -2.8199 |
72
+ | 0.6381 | 0.2617 | 1000 | 0.6358 | -0.0638 | -0.2074 | 0.6785 | 0.1436 | -282.1761 | -289.1206 | -2.7887 | -2.8138 |
73
+ | 0.6488 | 0.2879 | 1100 | 0.6284 | -0.1378 | -0.3055 | 0.6790 | 0.1677 | -291.9890 | -296.5138 | -2.7826 | -2.8071 |
74
+ | 0.6427 | 0.3141 | 1200 | 0.6223 | -0.1104 | -0.2986 | 0.6835 | 0.1882 | -291.3028 | -293.7785 | -2.7931 | -2.8165 |
75
+ | 0.6131 | 0.3402 | 1300 | 0.6172 | -0.1466 | -0.3514 | 0.6865 | 0.2049 | -296.5806 | -297.3945 | -2.7951 | -2.8180 |
76
+ | 0.6326 | 0.3664 | 1400 | 0.6155 | -0.1752 | -0.3896 | 0.6860 | 0.2144 | -300.3966 | -300.2597 | -2.7920 | -2.8147 |
77
+ | 0.6128 | 0.3926 | 1500 | 0.6180 | -0.0630 | -0.2687 | 0.6890 | 0.2057 | -288.3090 | -289.0369 | -2.7980 | -2.8198 |
78
+ | 0.6223 | 0.4187 | 1600 | 0.6088 | -0.1688 | -0.4097 | 0.6945 | 0.2409 | -302.4074 | -299.6220 | -2.7926 | -2.8148 |
79
+ | 0.6338 | 0.4449 | 1700 | 0.6061 | -0.2152 | -0.4665 | 0.6925 | 0.2513 | -308.0869 | -304.2535 | -2.7961 | -2.8181 |
80
+ | 0.585 | 0.4711 | 1800 | 0.6050 | -0.1327 | -0.3850 | 0.6915 | 0.2523 | -299.9368 | -296.0054 | -2.7949 | -2.8174 |
81
+ | 0.577 | 0.4973 | 1900 | 0.6013 | -0.2170 | -0.4883 | 0.6965 | 0.2713 | -310.2670 | -304.4333 | -2.7954 | -2.8176 |
82
+ | 0.5945 | 0.5234 | 2000 | 0.5992 | -0.2107 | -0.4899 | 0.6995 | 0.2793 | -310.4293 | -303.8028 | -2.7903 | -2.8122 |
83
+ | 0.5913 | 0.5496 | 2100 | 0.5981 | -0.2373 | -0.5251 | 0.7025 | 0.2879 | -313.9529 | -306.4641 | -2.7863 | -2.8085 |
84
+ | 0.5816 | 0.5758 | 2200 | 0.5989 | -0.2688 | -0.5570 | 0.6970 | 0.2883 | -317.1411 | -309.6146 | -2.7849 | -2.8070 |
85
+ | 0.5824 | 0.6019 | 2300 | 0.5961 | -0.2227 | -0.5189 | 0.6955 | 0.2961 | -313.3233 | -305.0098 | -2.7821 | -2.8037 |
86
+ | 0.602 | 0.6281 | 2400 | 0.5969 | -0.2683 | -0.5669 | 0.6990 | 0.2986 | -318.1251 | -309.5652 | -2.7744 | -2.7961 |
87
+ | 0.5792 | 0.6543 | 2500 | 0.5963 | -0.2102 | -0.5041 | 0.6975 | 0.2938 | -311.8429 | -303.7615 | -2.7763 | -2.7980 |
88
+ | 0.6028 | 0.6805 | 2600 | 0.5974 | -0.1896 | -0.4790 | 0.6920 | 0.2895 | -309.3417 | -301.6964 | -2.7717 | -2.7933 |
89
+ | 0.5854 | 0.7066 | 2700 | 0.5930 | -0.2517 | -0.5615 | 0.7020 | 0.3098 | -317.5864 | -307.9027 | -2.7676 | -2.7892 |
90
+ | 0.5994 | 0.7328 | 2800 | 0.5920 | -0.2607 | -0.5775 | 0.7045 | 0.3167 | -319.1838 | -308.8107 | -2.7636 | -2.7851 |
91
+ | 0.5837 | 0.7590 | 2900 | 0.5913 | -0.2540 | -0.5721 | 0.7055 | 0.3181 | -318.6511 | -308.1379 | -2.7619 | -2.7834 |
92
+ | 0.5858 | 0.7851 | 3000 | 0.5910 | -0.2625 | -0.5835 | 0.7055 | 0.3210 | -319.7853 | -308.9898 | -2.7605 | -2.7819 |
93
+ | 0.5685 | 0.8113 | 3100 | 0.5914 | -0.2383 | -0.5571 | 0.7040 | 0.3188 | -317.1507 | -306.5707 | -2.7558 | -2.7777 |
94
+ | 0.5753 | 0.8375 | 3200 | 0.5903 | -0.2623 | -0.5868 | 0.7020 | 0.3246 | -320.1224 | -308.9666 | -2.7567 | -2.7783 |
95
+ | 0.5769 | 0.8636 | 3300 | 0.5900 | -0.2673 | -0.5934 | 0.7030 | 0.3260 | -320.7757 | -309.4716 | -2.7555 | -2.7771 |
96
+ | 0.5608 | 0.8898 | 3400 | 0.5896 | -0.2716 | -0.5988 | 0.7020 | 0.3273 | -321.3196 | -309.8930 | -2.7520 | -2.7739 |
97
+ | 0.6008 | 0.9160 | 3500 | 0.5895 | -0.2716 | -0.5994 | 0.7035 | 0.3277 | -321.3745 | -309.9000 | -2.7539 | -2.7755 |
98
+ | 0.585 | 0.9422 | 3600 | 0.5895 | -0.2722 | -0.6000 | 0.7020 | 0.3279 | -321.4418 | -309.9531 | -2.7549 | -2.7764 |
99
+ | 0.567 | 0.9683 | 3700 | 0.5893 | -0.2738 | -0.6022 | 0.7015 | 0.3284 | -321.6555 | -310.1171 | -2.7539 | -2.7755 |
100
+ | 0.5834 | 0.9945 | 3800 | 0.5893 | -0.2740 | -0.6023 | 0.7025 | 0.3283 | -321.6666 | -310.1333 | -2.7525 | -2.7742 |
101
+
102
+
103
+ ### Framework versions
104
+
105
+ - PEFT 0.10.0
106
+ - Transformers 4.40.0
107
+ - Pytorch 2.2.0
108
+ - Datasets 2.16.1
109
+ - Tokenizers 0.19.1
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1818b92cbd208058d804d1f94c779b3a7f08d2aade39d0e96b4524b7d518431a
3
  size 1342238560
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7824785b77388bdacbd438b6940d2e36888c73f044b90f65f3e52ea1d3c98100
3
  size 1342238560
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.6164219083351729,
5
+ "train_runtime": 73481.1174,
6
+ "train_samples": 61134,
7
+ "train_samples_per_second": 0.832,
8
+ "train_steps_per_second": 0.052
9
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.6164219083351729,
5
+ "train_runtime": 73481.1174,
6
+ "train_samples": 61134,
7
+ "train_samples_per_second": 0.832,
8
+ "train_steps_per_second": 0.052
9
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff