mistralit2_1000_STEPS_rate_1e6_03_Beta_DPO
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.3501
- Rewards/chosen: -4.6533
- Rewards/rejected: -7.2695
- Rewards/accuracies: 0.6044
- Rewards/margins: 2.6162
- Logps/rejected: -52.8039
- Logps/chosen: -38.8969
- Logits/rejected: -2.8818
- Logits/chosen: -2.8827
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6609 | 0.1 | 50 | 0.7439 | -0.3799 | -0.6639 | 0.5363 | 0.2840 | -30.7855 | -24.6521 | -2.8212 | -2.8215 |
0.7223 | 0.2 | 100 | 1.2179 | -3.4197 | -4.5833 | 0.5670 | 1.1636 | -43.8500 | -34.7847 | -2.4935 | -2.4943 |
1.5151 | 0.29 | 150 | 1.3451 | -4.6461 | -5.3198 | 0.4923 | 0.6737 | -46.3050 | -38.8727 | -2.7810 | -2.7816 |
1.5249 | 0.39 | 200 | 1.5370 | -4.3700 | -4.3686 | 0.4659 | -0.0014 | -43.1345 | -37.9527 | -2.9607 | -2.9612 |
1.3975 | 0.49 | 250 | 1.2806 | -3.4083 | -3.9853 | 0.5319 | 0.5769 | -41.8567 | -34.7470 | -2.9314 | -2.9319 |
1.3304 | 0.59 | 300 | 1.3357 | -2.0104 | -2.3692 | 0.4945 | 0.3588 | -36.4698 | -30.0870 | -2.9631 | -2.9635 |
1.0439 | 0.68 | 350 | 1.2763 | -0.5270 | -0.8889 | 0.5077 | 0.3619 | -31.5354 | -25.1425 | -2.8440 | -2.8443 |
1.4598 | 0.78 | 400 | 1.2025 | -2.3552 | -3.1289 | 0.5560 | 0.7737 | -39.0019 | -31.2365 | -3.1671 | -3.1675 |
0.8046 | 0.88 | 450 | 1.2610 | -2.5219 | -3.3122 | 0.5538 | 0.7903 | -39.6132 | -31.7922 | -2.8903 | -2.8908 |
0.9395 | 0.98 | 500 | 1.1880 | -1.6006 | -2.5141 | 0.5451 | 0.9135 | -36.9527 | -28.7210 | -2.7295 | -2.7300 |
0.239 | 1.07 | 550 | 1.1556 | -2.0692 | -3.6279 | 0.5868 | 1.5587 | -40.6656 | -30.2832 | -2.8301 | -2.8308 |
0.1348 | 1.17 | 600 | 1.3248 | -3.6765 | -5.8923 | 0.5978 | 2.2158 | -48.2133 | -35.6409 | -2.8392 | -2.8400 |
0.328 | 1.27 | 650 | 1.2982 | -3.5842 | -5.5884 | 0.5868 | 2.0042 | -47.2005 | -35.3331 | -2.8786 | -2.8794 |
0.3605 | 1.37 | 700 | 1.2960 | -4.0655 | -6.4030 | 0.6000 | 2.3374 | -49.9156 | -36.9376 | -2.8812 | -2.8820 |
0.1389 | 1.46 | 750 | 1.3185 | -4.2670 | -6.7599 | 0.5956 | 2.4929 | -51.1054 | -37.6093 | -2.8897 | -2.8905 |
0.1871 | 1.56 | 800 | 1.3483 | -4.5542 | -7.1419 | 0.5978 | 2.5877 | -52.3788 | -38.5665 | -2.8779 | -2.8788 |
0.3556 | 1.66 | 850 | 1.3507 | -4.6209 | -7.2288 | 0.6000 | 2.6080 | -52.6684 | -38.7887 | -2.8809 | -2.8817 |
0.4099 | 1.76 | 900 | 1.3517 | -4.6482 | -7.2597 | 0.6022 | 2.6114 | -52.7713 | -38.8799 | -2.8817 | -2.8826 |
0.3996 | 1.86 | 950 | 1.3491 | -4.6540 | -7.2682 | 0.6044 | 2.6142 | -52.7997 | -38.8992 | -2.8818 | -2.8827 |
0.2013 | 1.95 | 1000 | 1.3501 | -4.6533 | -7.2695 | 0.6044 | 2.6162 | -52.8039 | -38.8969 | -2.8818 | -2.8827 |
Framework versions
- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 7
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for tsavage68/mistralit2_1000_STEPS_rate_1e6_03_Beta_DPO
Base model
mistralai/Mistral-7B-Instruct-v0.2