File size: 7,565 Bytes
bbb9d0c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
library_name: transformers
tags:
- trl
- cpo
- generated_from_trainer
model-index:
- name: OpenELM-1_1B-CPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# OpenELM-1_1B-CPO

This model was trained from scratch on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.1904
- Rewards/chosen: -3.6406
- Rewards/rejected: -4.4375
- Rewards/accuracies: 0.5918
- Rewards/margins: 0.8008
- Logps/rejected: -444.0
- Logps/chosen: -364.0
- Logits/rejected: -7.5312
- Logits/chosen: -8.875
- Nll Loss: 1.1719

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Nll Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
|:-------------:|:------:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:--------:|:------------------:|:--------------:|:---------------:|:----------------:|
| 2.4271        | 0.1047 | 100  | -12.3125      | -12.125         | -336.0       | -328.0         | 2.2959          | 1.0859   | 0.4980             | -3.3594        | -0.0850         | -3.2812          |
| 2.2538        | 0.2093 | 200  | -9.875        | -9.5            | -338.0       | -346.0         | 2.1836          | 1.0938   | 0.5234             | -3.3906        | 0.0640          | -3.4531          |
| 2.1253        | 0.3140 | 300  | -11.4375      | -11.0           | -346.0       | -360.0         | 2.1307          | 1.1172   | 0.5176             | -3.4531        | 0.1416          | -3.5938          |
| 2.0609        | 0.4186 | 400  | -11.125       | -10.625         | -332.0       | -344.0         | 2.1359          | 1.0703   | 0.5293             | -3.3281        | 0.1187          | -3.4375          |
| 2.1905        | 0.5233 | 500  | -9.3125       | -8.5            | -338.0       | -352.0         | 2.1286          | 1.0859   | 0.5254             | -3.375         | 0.1357          | -3.5156          |
| 2.1304        | 0.6279 | 600  | -10.625       | -9.625          | -360.0       | -398.0         | 2.1410          | 1.1562   | 0.5723             | -3.6094        | 0.3672          | -3.9688          |
| 2.2554        | 0.7326 | 700  | -9.6875       | -8.5625         | -374.0       | -416.0         | 2.1848          | 1.2031   | 0.5664             | -3.7344        | 0.4258          | -4.1562          |
| 2.0796        | 0.8373 | 800  | -7.8438       | -7.0312         | -346.0       | -374.0         | 2.1224          | 1.1172   | 0.5469             | -3.4531        | 0.2852          | -3.75            |
| 2.1021        | 0.9419 | 900  | -6.2812       | -5.2812         | -350.0       | -390.0         | 2.1099          | 1.1328   | 0.5723             | -3.5           | 0.4062          | -3.9062          |
| 1.5182        | 1.0471 | 1000 | 2.1662        | -3.5            | -3.8594      | 0.5664         | 0.3633          | -386.0   | -350.0             | -9.375         | -10.625         | 1.125            |
| 1.4917        | 1.1518 | 1100 | 2.1588        | -3.5625         | -4.0         | 0.5703         | 0.4395          | -400.0   | -356.0             | -6.4688        | -7.875          | 1.1484           |
| 1.5219        | 1.2564 | 1200 | 2.1449        | -3.625          | -4.1875      | 0.5938         | 0.5586          | -420.0   | -364.0             | -6.6562        | -7.7812         | 1.1719           |
| 1.5292        | 1.3611 | 1300 | 2.1489        | -3.5312         | -4.0         | 0.5742         | 0.4785          | -402.0   | -354.0             | -7.75          | -8.875          | 1.1406           |
| 1.4257        | 1.4657 | 1400 | 2.1193        | -3.5781         | -4.0938      | 0.5801         | 0.5156          | -410.0   | -358.0             | -7.7188        | -9.25           | 1.1562           |
| 1.4366        | 1.5704 | 1500 | 2.0983        | -3.5938         | -4.1562      | 0.5898         | 0.5586          | -416.0   | -358.0             | -7.6875        | -8.9375         | 1.1562           |
| 1.5246        | 1.6750 | 1600 | 2.1191        | -3.5781         | -4.2188      | 0.5938         | 0.625           | -420.0   | -358.0             | -5.4688        | -6.9062         | 1.1562           |
| 1.4534        | 1.7797 | 1700 | 2.0829        | -3.4688         | -4.0312      | 0.5762         | 0.5625          | -404.0   | -348.0             | -9.0625        | -10.0625        | 1.1172           |
| 1.4551        | 1.8844 | 1800 | 2.1033        | -3.5625         | -4.1562      | 0.5898         | 0.6016          | -416.0   | -356.0             | -6.8438        | -8.1875         | 1.1484           |
| 1.4969        | 1.9890 | 1900 | 2.1046        | -3.5312         | -4.125       | 0.5762         | 0.5938          | -412.0   | -354.0             | -8.125         | -9.3125         | 1.1406           |
| 0.9984        | 2.0937 | 2000 | 2.1806        | -3.6406         | -4.2812      | 0.5781         | 0.6367          | -428.0   | -364.0             | -7.9375        | -9.1875         | 1.1719           |
| 0.9885        | 2.1983 | 2100 | 2.1927        | -3.6875         | -4.5         | 0.5801         | 0.7930          | -448.0   | -370.0             | -7.4062        | -8.6875         | 1.1875           |
| 0.9814        | 2.3030 | 2200 | 2.1867        | -3.625          | -4.3438      | 0.5742         | 0.7266          | -436.0   | -362.0             | -7.5           | -8.8125         | 1.1719           |
| 0.9844        | 2.4076 | 2300 | 2.1905        | -3.6875         | -4.5312      | 0.5996         | 0.8438          | -452.0   | -368.0             | -7.125         | -8.375          | 1.1875           |
| 0.9931        | 2.5123 | 2400 | 2.1843        | -3.6406         | -4.4375      | 0.5820         | 0.7930          | -442.0   | -364.0             | -7.375         | -8.6875         | 1.1719           |
| 0.9537        | 2.6170 | 2500 | 2.1907        | -3.6406         | -4.4688      | 0.5898         | 0.8125          | -446.0   | -364.0             | -7.5           | -8.8125         | 1.1719           |
| 0.9512        | 2.7216 | 2600 | 2.1918        | -3.6406         | -4.4375      | 0.5898         | 0.8086          | -446.0   | -364.0             | -7.5           | -8.8125         | 1.1719           |
| 0.9604        | 2.8263 | 2700 | 2.1906        | -3.6406         | -4.4375      | 0.5879         | 0.7969          | -442.0   | -364.0             | -7.5312        | -8.875          | 1.1719           |
| 1.0208        | 2.9309 | 2800 | 2.1904        | -3.6406         | -4.4375      | 0.5918         | 0.8008          | -444.0   | -364.0             | -7.5312        | -8.875          | 1.1719           |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.3.0
- Datasets 3.0.0
- Tokenizers 0.19.1