do test run on scitas with ref_model
Browse files
README.md
CHANGED
@@ -1,31 +1,31 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
-
base_model:
|
4 |
tags:
|
5 |
- trl
|
6 |
- dpo
|
7 |
- generated_from_trainer
|
8 |
model-index:
|
9 |
-
- name: gpt2-dpo-
|
10 |
results: []
|
11 |
---
|
12 |
|
13 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
14 |
should probably proofread and complete it, then remove this comment. -->
|
15 |
|
16 |
-
# gpt2-dpo-
|
17 |
|
18 |
-
This model is a fine-tuned version of [
|
19 |
It achieves the following results on the evaluation set:
|
20 |
-
- Loss: 0.
|
21 |
-
- Rewards/chosen: 1.
|
22 |
-
- Rewards/rejected:
|
23 |
-
- Rewards/accuracies: 0.
|
24 |
-
- Rewards/margins: 0.
|
25 |
-
- Logps/rejected: -
|
26 |
-
- Logps/chosen: -
|
27 |
-
- Logits/rejected: -
|
28 |
-
- Logits/chosen: -
|
29 |
|
30 |
## Model description
|
31 |
|
@@ -59,16 +59,16 @@ The following hyperparameters were used during training:
|
|
59 |
|
60 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
61 |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
62 |
-
| 0.
|
63 |
-
| 0.
|
64 |
-
| 0.
|
65 |
-
| 0.
|
66 |
-
| 0.
|
67 |
-
| 0.
|
68 |
-
| 0.
|
69 |
-
| 0.
|
70 |
-
| 0.
|
71 |
-
| 0.
|
72 |
|
73 |
|
74 |
### Framework versions
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
base_model: openai-community/gpt2
|
4 |
tags:
|
5 |
- trl
|
6 |
- dpo
|
7 |
- generated_from_trainer
|
8 |
model-index:
|
9 |
+
- name: gpt2-dpo-from_base_gpt2
|
10 |
results: []
|
11 |
---
|
12 |
|
13 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
14 |
should probably proofread and complete it, then remove this comment. -->
|
15 |
|
16 |
+
# gpt2-dpo-from_base_gpt2
|
17 |
|
18 |
+
This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on the None dataset.
|
19 |
It achieves the following results on the evaluation set:
|
20 |
+
- Loss: 0.6406
|
21 |
+
- Rewards/chosen: 1.1312
|
22 |
+
- Rewards/rejected: 0.9208
|
23 |
+
- Rewards/accuracies: 0.6373
|
24 |
+
- Rewards/margins: 0.2103
|
25 |
+
- Logps/rejected: -429.5498
|
26 |
+
- Logps/chosen: -508.5024
|
27 |
+
- Logits/rejected: -96.1598
|
28 |
+
- Logits/chosen: -94.9073
|
29 |
|
30 |
## Model description
|
31 |
|
|
|
59 |
|
60 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
61 |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
62 |
+
| 0.6679 | 0.9993 | 668 | 0.6728 | 0.2747 | 0.2209 | 0.625 | 0.0538 | -436.5490 | -517.0669 | -96.0258 | -94.8005 |
|
63 |
+
| 0.6697 | 2.0 | 1337 | 0.6545 | 0.6507 | 0.5283 | 0.6295 | 0.1224 | -433.4745 | -513.3065 | -96.0560 | -94.8147 |
|
64 |
+
| 0.6516 | 2.9993 | 2005 | 0.6467 | 0.8424 | 0.6867 | 0.6336 | 0.1557 | -431.8912 | -511.3903 | -96.1361 | -94.8919 |
|
65 |
+
| 0.6264 | 4.0 | 2674 | 0.6436 | 0.9803 | 0.7989 | 0.6336 | 0.1814 | -430.7686 | -510.0109 | -96.1278 | -94.8762 |
|
66 |
+
| 0.6114 | 4.9993 | 3342 | 0.6420 | 1.0453 | 0.8518 | 0.6377 | 0.1935 | -430.2403 | -509.3612 | -96.1435 | -94.8917 |
|
67 |
+
| 0.6016 | 6.0 | 4011 | 0.6412 | 1.0870 | 0.8859 | 0.6377 | 0.2011 | -429.8991 | -508.9442 | -96.1471 | -94.8941 |
|
68 |
+
| 0.6115 | 6.9993 | 4679 | 0.6408 | 1.1137 | 0.9071 | 0.6384 | 0.2066 | -429.6871 | -508.6768 | -96.1587 | -94.9064 |
|
69 |
+
| 0.6079 | 8.0 | 5348 | 0.6406 | 1.1274 | 0.9178 | 0.6388 | 0.2096 | -429.5802 | -508.5403 | -96.1573 | -94.9046 |
|
70 |
+
| 0.6066 | 8.9993 | 6016 | 0.6406 | 1.1310 | 0.9207 | 0.6373 | 0.2103 | -429.5507 | -508.5036 | -96.1593 | -94.9068 |
|
71 |
+
| 0.5968 | 9.9925 | 6680 | 0.6406 | 1.1312 | 0.9208 | 0.6373 | 0.2103 | -429.5498 | -508.5024 | -96.1598 | -94.9073 |
|
72 |
|
73 |
|
74 |
### Framework versions
|