---
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: IE_L3_1000steps_1e7rate_05beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# IE_L3_1000steps_1e7rate_05beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/IE_L3_1000steps_1e6rate_SFT](https://huggingface.co/tsavage68/IE_L3_1000steps_1e6rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1802
- Rewards/chosen: -1.1386
- Rewards/rejected: -10.9339
- Rewards/accuracies: 0.7400
- Rewards/margins: 9.7954
- Logps/rejected: -97.4951
- Logps/chosen: -85.0749
- Logits/rejected: -0.7939
- Logits/chosen: -0.7200

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.4416        | 0.4   | 50   | 0.3457          | -0.0969        | -1.1506          | 0.7400             | 1.0537          | -77.9284       | -82.9916     | -0.7954         | -0.7373       |
| 0.1388        | 0.8   | 100  | 0.1803          | -0.7835        | -7.7662          | 0.7400             | 6.9827          | -91.1596       | -84.3647     | -0.7936         | -0.7251       |
| 0.1387        | 1.2   | 150  | 0.1802          | -0.9415        | -9.2178          | 0.7400             | 8.2763          | -94.0629       | -84.6808     | -0.7940         | -0.7226       |
| 0.1733        | 1.6   | 200  | 0.1802          | -0.9618        | -9.5890          | 0.7400             | 8.6272          | -94.8052       | -84.7213     | -0.7940         | -0.7227       |
| 0.2253        | 2.0   | 250  | 0.1802          | -1.0365        | -9.8116          | 0.7400             | 8.7750          | -95.2504       | -84.8709     | -0.7938         | -0.7219       |
| 0.1386        | 2.4   | 300  | 0.1802          | -1.0393        | -10.0428         | 0.7400             | 9.0035          | -95.7128       | -84.8764     | -0.7938         | -0.7216       |
| 0.1213        | 2.8   | 350  | 0.1802          | -1.0477        | -10.3216         | 0.7400             | 9.2739          | -96.2705       | -84.8933     | -0.7934         | -0.7207       |
| 0.1906        | 3.2   | 400  | 0.1802          | -1.0921        | -10.5149         | 0.7400             | 9.4228          | -96.6571       | -84.9820     | -0.7947         | -0.7217       |
| 0.1906        | 3.6   | 450  | 0.1802          | -1.0970        | -10.5317         | 0.7400             | 9.4347          | -96.6906       | -84.9917     | -0.7945         | -0.7214       |
| 0.208         | 4.0   | 500  | 0.1802          | -1.1136        | -10.7153         | 0.7400             | 9.6017          | -97.0578       | -85.0249     | -0.7951         | -0.7219       |
| 0.156         | 4.4   | 550  | 0.1802          | -1.1237        | -10.8074         | 0.7400             | 9.6837          | -97.2419       | -85.0451     | -0.7948         | -0.7214       |
| 0.1213        | 4.8   | 600  | 0.1802          | -1.1291        | -10.8336         | 0.7400             | 9.7045          | -97.2944       | -85.0559     | -0.7943         | -0.7205       |
| 0.1906        | 5.2   | 650  | 0.1802          | -1.1297        | -10.8980         | 0.7400             | 9.7683          | -97.4233       | -85.0572     | -0.7939         | -0.7202       |
| 0.2426        | 5.6   | 700  | 0.1802          | -1.1277        | -10.8859         | 0.7400             | 9.7582          | -97.3990       | -85.0531     | -0.7953         | -0.7215       |
| 0.2599        | 6.0   | 750  | 0.1802          | -1.1398        | -10.9204         | 0.7400             | 9.7806          | -97.4681       | -85.0774     | -0.7944         | -0.7204       |
| 0.1213        | 6.4   | 800  | 0.1802          | -1.1496        | -10.9309         | 0.7400             | 9.7813          | -97.4891       | -85.0970     | -0.7947         | -0.7207       |
| 0.2426        | 6.8   | 850  | 0.1802          | -1.1208        | -10.9075         | 0.7400             | 9.7867          | -97.4422       | -85.0394     | -0.7944         | -0.7204       |
| 0.1733        | 7.2   | 900  | 0.1802          | -1.1302        | -10.9173         | 0.7400             | 9.7871          | -97.4618       | -85.0581     | -0.7939         | -0.7201       |
| 0.1386        | 7.6   | 950  | 0.1802          | -1.1386        | -10.9339         | 0.7400             | 9.7954          | -97.4951       | -85.0749     | -0.7939         | -0.7200       |
| 0.156         | 8.0   | 1000 | 0.1802          | -1.1386        | -10.9339         | 0.7400             | 9.7954          | -97.4951       | -85.0749     | -0.7939         | -0.7200       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1