---
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: IE_L3_1000steps_1e7rate_01beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# IE_L3_1000steps_1e7rate_01beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/IE_L3_1000steps_1e6rate_SFT](https://huggingface.co/tsavage68/IE_L3_1000steps_1e6rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1803
- Rewards/chosen: -0.5346
- Rewards/rejected: -8.6468
- Rewards/accuracies: 0.7400
- Rewards/margins: 8.1123
- Logps/rejected: -162.0956
- Logps/chosen: -88.1433
- Logits/rejected: -0.8498
- Logits/chosen: -0.7319

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6292        | 0.4   | 50   | 0.5972          | -0.0178        | -0.2247          | 0.7400             | 0.2070          | -77.8745       | -82.9754     | -0.7952         | -0.7369       |
| 0.2432        | 0.8   | 100  | 0.2531          | -0.1984        | -1.9084          | 0.7400             | 1.7099          | -94.7109       | -84.7823     | -0.7935         | -0.7222       |
| 0.1468        | 1.2   | 150  | 0.1842          | -0.4156        | -4.4900          | 0.7400             | 4.0744          | -120.5273      | -86.9542     | -0.8149         | -0.7193       |
| 0.1745        | 1.6   | 200  | 0.1807          | -0.4305        | -6.5857          | 0.7400             | 6.1551          | -141.4839      | -87.1031     | -0.8342         | -0.7283       |
| 0.2254        | 2.0   | 250  | 0.1805          | -0.4554        | -7.3110          | 0.7400             | 6.8555          | -148.7368      | -87.3519     | -0.8373         | -0.7278       |
| 0.1389        | 2.4   | 300  | 0.1804          | -0.4666        | -7.7073          | 0.7400             | 7.2408          | -152.7006      | -87.4635     | -0.8397         | -0.7280       |
| 0.1215        | 2.8   | 350  | 0.1804          | -0.4933        | -8.0779          | 0.7400             | 7.5846          | -156.4058      | -87.7304     | -0.8446         | -0.7309       |
| 0.191         | 3.2   | 400  | 0.1804          | -0.5121        | -8.2398          | 0.7400             | 7.7277          | -158.0253      | -87.9188     | -0.8463         | -0.7322       |
| 0.1906        | 3.6   | 450  | 0.1804          | -0.5199        | -8.2886          | 0.7400             | 7.7687          | -158.5128      | -87.9963     | -0.8471         | -0.7317       |
| 0.2084        | 4.0   | 500  | 0.1804          | -0.5104        | -8.4325          | 0.7400             | 7.9221          | -159.9520      | -87.9018     | -0.8488         | -0.7326       |
| 0.1561        | 4.4   | 550  | 0.1803          | -0.5293        | -8.5197          | 0.7400             | 7.9905          | -160.8244      | -88.0903     | -0.8493         | -0.7326       |
| 0.1213        | 4.8   | 600  | 0.1803          | -0.5356        | -8.5680          | 0.7400             | 8.0324          | -161.3075      | -88.1538     | -0.8503         | -0.7332       |
| 0.1907        | 5.2   | 650  | 0.1803          | -0.5333        | -8.6184          | 0.7400             | 8.0851          | -161.8111      | -88.1307     | -0.8505         | -0.7330       |
| 0.2427        | 5.6   | 700  | 0.1803          | -0.5362        | -8.6233          | 0.7400             | 8.0871          | -161.8604      | -88.1602     | -0.8507         | -0.7332       |
| 0.2601        | 6.0   | 750  | 0.1803          | -0.5367        | -8.6352          | 0.7400             | 8.0985          | -161.9794      | -88.1651     | -0.8509         | -0.7332       |
| 0.1213        | 6.4   | 800  | 0.1803          | -0.5353        | -8.6312          | 0.7400             | 8.0960          | -161.9397      | -88.1506     | -0.8507         | -0.7334       |
| 0.2426        | 6.8   | 850  | 0.1803          | -0.5305        | -8.6468          | 0.7400             | 8.1163          | -162.0951      | -88.1023     | -0.8507         | -0.7328       |
| 0.1733        | 7.2   | 900  | 0.1803          | -0.5246        | -8.6359          | 0.7400             | 8.1112          | -161.9858      | -88.0442     | -0.8503         | -0.7323       |
| 0.1388        | 7.6   | 950  | 0.1803          | -0.5346        | -8.6468          | 0.7400             | 8.1123          | -162.0956      | -88.1433     | -0.8498         | -0.7319       |
| 0.1561        | 8.0   | 1000 | 0.1803          | -0.5346        | -8.6468          | 0.7400             | 8.1123          | -162.0956      | -88.1433     | -0.8498         | -0.7319       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1