IE_M2_1000steps_1e8rate_01beta_cSFTDPO
This model is a fine-tuned version of tsavage68/IE_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6722
- Rewards/chosen: -0.0007
- Rewards/rejected: -0.0438
- Rewards/accuracies: 0.4600
- Rewards/margins: 0.0430
- Logps/rejected: -41.4594
- Logps/chosen: -42.2128
- Logits/rejected: -2.9152
- Logits/chosen: -2.8539
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-08
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6934 | 0.4 | 50 | 0.6935 | 0.0012 | 0.0018 | 0.2150 | -0.0007 | -41.0035 | -42.1940 | -2.9159 | -2.8546 |
0.6929 | 0.8 | 100 | 0.6915 | -0.0004 | -0.0037 | 0.2700 | 0.0033 | -41.0589 | -42.2096 | -2.9161 | -2.8547 |
0.6885 | 1.2 | 150 | 0.6866 | -0.0003 | -0.0137 | 0.3850 | 0.0134 | -41.1590 | -42.2088 | -2.9159 | -2.8546 |
0.6814 | 1.6 | 200 | 0.6823 | -0.0012 | -0.0232 | 0.4450 | 0.0220 | -41.2542 | -42.2174 | -2.9154 | -2.8541 |
0.6809 | 2.0 | 250 | 0.6785 | 0.0002 | -0.0296 | 0.4550 | 0.0299 | -41.3181 | -42.2031 | -2.9155 | -2.8541 |
0.6811 | 2.4 | 300 | 0.6754 | -0.0007 | -0.0371 | 0.4550 | 0.0364 | -41.3932 | -42.2125 | -2.9154 | -2.8541 |
0.6735 | 2.8 | 350 | 0.6737 | -0.0013 | -0.0412 | 0.4600 | 0.0399 | -41.4335 | -42.2181 | -2.9153 | -2.8539 |
0.6771 | 3.2 | 400 | 0.6722 | -0.0004 | -0.0434 | 0.4600 | 0.0430 | -41.4555 | -42.2092 | -2.9152 | -2.8540 |
0.6766 | 3.6 | 450 | 0.6723 | -0.0018 | -0.0446 | 0.4600 | 0.0428 | -41.4676 | -42.2235 | -2.9152 | -2.8539 |
0.6789 | 4.0 | 500 | 0.6712 | -0.0002 | -0.0454 | 0.4600 | 0.0451 | -41.4754 | -42.2077 | -2.9152 | -2.8539 |
0.6691 | 4.4 | 550 | 0.6713 | -0.0007 | -0.0456 | 0.4600 | 0.0450 | -41.4782 | -42.2121 | -2.9152 | -2.8539 |
0.6703 | 4.8 | 600 | 0.6720 | -0.0010 | -0.0444 | 0.4600 | 0.0434 | -41.4658 | -42.2152 | -2.9152 | -2.8538 |
0.6752 | 5.2 | 650 | 0.6717 | -0.0017 | -0.0458 | 0.4550 | 0.0441 | -41.4803 | -42.2227 | -2.9152 | -2.8539 |
0.6759 | 5.6 | 700 | 0.6712 | 0.0000 | -0.0452 | 0.4600 | 0.0452 | -41.4734 | -42.2054 | -2.9150 | -2.8537 |
0.6694 | 6.0 | 750 | 0.6725 | -0.0018 | -0.0443 | 0.4600 | 0.0425 | -41.4648 | -42.2237 | -2.9153 | -2.8539 |
0.6593 | 6.4 | 800 | 0.6721 | -0.0006 | -0.0439 | 0.4600 | 0.0433 | -41.4608 | -42.2114 | -2.9153 | -2.8540 |
0.6781 | 6.8 | 850 | 0.6722 | -0.0007 | -0.0437 | 0.4600 | 0.0430 | -41.4589 | -42.2125 | -2.9152 | -2.8539 |
0.668 | 7.2 | 900 | 0.6722 | -0.0007 | -0.0438 | 0.4600 | 0.0430 | -41.4594 | -42.2128 | -2.9152 | -2.8539 |
0.6692 | 7.6 | 950 | 0.6722 | -0.0007 | -0.0438 | 0.4600 | 0.0430 | -41.4594 | -42.2128 | -2.9152 | -2.8539 |
0.6724 | 8.0 | 1000 | 0.6722 | -0.0007 | -0.0438 | 0.4600 | 0.0430 | -41.4594 | -42.2128 | -2.9152 | -2.8539 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for tsavage68/IE_M2_1000steps_1e8rate_01beta_cSFTDPO
Base model
mistralai/Mistral-7B-Instruct-v0.2
Finetuned
tsavage68/IE_M2_1000steps_1e7rate_SFT