mistralit2_1000_STEPS_1e6_05_beta_DPO
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.7261
- Rewards/chosen: -2.7031
- Rewards/rejected: -5.5561
- Rewards/accuracies: 0.5890
- Rewards/margins: 2.8530
- Logps/rejected: -39.6846
- Logps/chosen: -28.7920
- Logits/rejected: -2.5943
- Logits/chosen: -2.5947
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.7251 | 0.1 | 50 | 0.8837 | 0.1755 | -0.1037 | 0.4901 | 0.2792 | -28.7799 | -23.0348 | -2.8359 | -2.8362 |
0.9163 | 0.2 | 100 | 1.7788 | -4.6432 | -6.2118 | 0.5231 | 1.5686 | -40.9959 | -32.6723 | -2.6192 | -2.6196 |
2.5499 | 0.29 | 150 | 1.9611 | -3.8807 | -4.8711 | 0.5033 | 0.9904 | -38.3145 | -31.1472 | -2.8718 | -2.8723 |
1.6289 | 0.39 | 200 | 2.1262 | -4.2615 | -4.3039 | 0.4462 | 0.0423 | -37.1802 | -31.9089 | -2.5439 | -2.5442 |
2.3907 | 0.49 | 250 | 2.1527 | -2.9174 | -2.6939 | 0.4527 | -0.2235 | -33.9602 | -29.2207 | -2.7643 | -2.7646 |
1.4887 | 0.59 | 300 | 2.2144 | -2.7649 | -3.3119 | 0.4725 | 0.5470 | -35.1962 | -28.9157 | -2.7607 | -2.7611 |
1.9594 | 0.68 | 350 | 2.1934 | -0.0315 | 0.0006 | 0.4593 | -0.0322 | -28.5711 | -23.4489 | -2.6191 | -2.6193 |
2.1399 | 0.78 | 400 | 1.9044 | -4.4917 | -5.1288 | 0.4989 | 0.6371 | -38.8300 | -32.3693 | -2.8491 | -2.8494 |
1.1937 | 0.88 | 450 | 1.9658 | -2.8086 | -3.5888 | 0.4989 | 0.7802 | -35.7500 | -29.0030 | -2.8330 | -2.8333 |
1.6222 | 0.98 | 500 | 1.8626 | -2.3058 | -3.5222 | 0.5363 | 1.2164 | -35.6167 | -27.9974 | -2.7302 | -2.7305 |
0.5066 | 1.07 | 550 | 1.8660 | -2.9490 | -5.0994 | 0.5758 | 2.1504 | -38.7712 | -29.2838 | -2.7083 | -2.7087 |
0.4413 | 1.17 | 600 | 1.7645 | -4.3370 | -6.8789 | 0.5868 | 2.5419 | -42.3302 | -32.0597 | -2.6355 | -2.6360 |
0.2726 | 1.27 | 650 | 1.7971 | -1.8488 | -4.1281 | 0.5780 | 2.2793 | -36.8285 | -27.0834 | -2.6083 | -2.6085 |
0.2803 | 1.37 | 700 | 1.7498 | -2.2886 | -4.8524 | 0.5802 | 2.5639 | -38.2772 | -27.9629 | -2.6089 | -2.6092 |
0.199 | 1.46 | 750 | 1.7383 | -2.5467 | -5.2810 | 0.5868 | 2.7343 | -39.1343 | -28.4792 | -2.5998 | -2.6002 |
0.2405 | 1.56 | 800 | 1.7280 | -2.4873 | -5.2804 | 0.5890 | 2.7931 | -39.1332 | -28.3604 | -2.5980 | -2.5984 |
0.2125 | 1.66 | 850 | 1.7269 | -2.6426 | -5.4648 | 0.5846 | 2.8223 | -39.5021 | -28.6710 | -2.5949 | -2.5953 |
0.3193 | 1.76 | 900 | 1.7253 | -2.6905 | -5.5366 | 0.5912 | 2.8461 | -39.6456 | -28.7668 | -2.5945 | -2.5949 |
0.3209 | 1.86 | 950 | 1.7242 | -2.6996 | -5.5548 | 0.5912 | 2.8552 | -39.6820 | -28.7851 | -2.5942 | -2.5946 |
0.278 | 1.95 | 1000 | 1.7261 | -2.7031 | -5.5561 | 0.5890 | 2.8530 | -39.6846 | -28.7920 | -2.5943 | -2.5947 |
Framework versions
- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 8
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for tsavage68/mistralit2_1000_STEPS_1e6_05_beta_DPO
Base model
mistralai/Mistral-7B-Instruct-v0.2