---
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: IE_L3_1000steps_1e6rate_03beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# IE_L3_1000steps_1e6rate_03beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/IE_L3_1000steps_1e6rate_SFT](https://huggingface.co/tsavage68/IE_L3_1000steps_1e6rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1802
- Rewards/chosen: -1.3199
- Rewards/rejected: -13.3530
- Rewards/accuracies: 0.7400
- Rewards/margins: 12.0331
- Logps/rejected: -120.1372
- Logps/chosen: -87.1973
- Logits/rejected: -0.8052
- Logits/chosen: -0.7124

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.1907        | 0.4   | 50   | 0.1802          | -1.0923        | -10.4680         | 0.7400             | 9.3757          | -110.5205      | -86.4386     | -0.7963         | -0.7114       |
| 0.1386        | 0.8   | 100  | 0.1802          | -1.2190        | -11.5716         | 0.7400             | 10.3526         | -114.1993      | -86.8611     | -0.7960         | -0.7088       |
| 0.1386        | 1.2   | 150  | 0.1802          | -1.2269        | -11.8797         | 0.7400             | 10.6528         | -115.2263      | -86.8875     | -0.7973         | -0.7092       |
| 0.1733        | 1.6   | 200  | 0.1802          | -1.2628        | -12.4562         | 0.7400             | 11.1934         | -117.1479      | -87.0072     | -0.7983         | -0.7088       |
| 0.2253        | 2.0   | 250  | 0.1802          | -1.2811        | -12.6109         | 0.7400             | 11.3298         | -117.6637      | -87.0682     | -0.8005         | -0.7100       |
| 0.1386        | 2.4   | 300  | 0.1802          | -1.2819        | -12.6821         | 0.7400             | 11.4002         | -117.9011      | -87.0709     | -0.8009         | -0.7104       |
| 0.1213        | 2.8   | 350  | 0.1802          | -1.2857        | -12.9252         | 0.7400             | 11.6395         | -118.7114      | -87.0834     | -0.8024         | -0.7110       |
| 0.1906        | 3.2   | 400  | 0.1802          | -1.2904        | -12.9929         | 0.7400             | 11.7024         | -118.9368      | -87.0992     | -0.8026         | -0.7109       |
| 0.1906        | 3.6   | 450  | 0.1802          | -1.2935        | -13.0320         | 0.7400             | 11.7385         | -119.0673      | -87.1095     | -0.8030         | -0.7112       |
| 0.2079        | 4.0   | 500  | 0.1802          | -1.3034        | -13.1728         | 0.7400             | 11.8694         | -119.5364      | -87.1423     | -0.8047         | -0.7126       |
| 0.156         | 4.4   | 550  | 0.1802          | -1.3085        | -13.2242         | 0.7400             | 11.9157         | -119.7078      | -87.1593     | -0.8035         | -0.7118       |
| 0.1213        | 4.8   | 600  | 0.1802          | -1.2992        | -13.2411         | 0.7400             | 11.9418         | -119.7642      | -87.1285     | -0.8054         | -0.7131       |
| 0.1906        | 5.2   | 650  | 0.1802          | -1.3144        | -13.3156         | 0.7400             | 12.0011         | -120.0125      | -87.1792     | -0.8048         | -0.7117       |
| 0.2426        | 5.6   | 700  | 0.1802          | -1.2925        | -13.3031         | 0.7400             | 12.0106         | -119.9710      | -87.1061     | -0.8043         | -0.7117       |
| 0.2599        | 6.0   | 750  | 0.1802          | -1.3084        | -13.3298         | 0.7400             | 12.0213         | -120.0597      | -87.1592     | -0.8052         | -0.7126       |
| 0.1213        | 6.4   | 800  | 0.1802          | -1.3118        | -13.3477         | 0.7400             | 12.0359         | -120.1197      | -87.1704     | -0.8039         | -0.7116       |
| 0.2426        | 6.8   | 850  | 0.1802          | -1.3228        | -13.3620         | 0.7400             | 12.0392         | -120.1673      | -87.2071     | -0.8052         | -0.7125       |
| 0.1733        | 7.2   | 900  | 0.1802          | -1.3137        | -13.3379         | 0.7400             | 12.0242         | -120.0870      | -87.1768     | -0.8052         | -0.7125       |
| 0.1386        | 7.6   | 950  | 0.1802          | -1.3070        | -13.3530         | 0.7400             | 12.0460         | -120.1374      | -87.1545     | -0.8053         | -0.7127       |
| 0.156         | 8.0   | 1000 | 0.1802          | -1.3199        | -13.3530         | 0.7400             | 12.0331         | -120.1372      | -87.1973     | -0.8052         | -0.7124       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1