mistral-rand-300k / README.md
megha-shroff's picture
End of training
b49a884 verified
|
raw
history blame
14.7 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - trl
  - sft
  - generated_from_trainer
base_model: TheBloke/Mistral-7B-v0.1-GPTQ
model-index:
  - name: mistral-rand-300k
    results: []

mistral-rand-300k

This model is a fine-tuned version of TheBloke/Mistral-7B-v0.1-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3537

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • training_steps: 250
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.6373 0.01 50 0.8699
0.7891 0.02 100 0.7454
0.7135 0.02 150 0.6984
0.6931 0.03 200 0.6722
0.6592 0.08 250 0.6407
0.634 0.1 300 0.6212
0.61 0.11 350 0.6015
0.5961 0.13 400 0.5892
0.5795 0.14 450 0.5773
0.5765 0.16 500 0.5683
0.5619 0.18 550 0.5612
0.5558 0.19 600 0.5519
0.5572 0.21 650 0.5438
0.5467 0.22 700 0.5348
0.5303 0.24 750 0.5248
0.5307 0.26 800 0.5135
0.508 0.27 850 0.5036
0.5087 0.29 900 0.5009
0.5073 0.3 950 0.4945
0.5078 0.32 1000 0.4916
0.5089 0.34 1050 0.4893
0.4829 0.35 1100 0.4862
0.4872 0.37 1150 0.4832
0.4845 0.39 1200 0.4803
0.4993 0.4 1250 0.4774
0.475 0.42 1300 0.4746
0.4836 0.43 1350 0.4735
0.4748 0.45 1400 0.4708
0.4809 0.47 1450 0.4693
0.4755 0.48 1500 0.4668
0.4679 0.5 1550 0.4644
0.4685 0.51 1600 0.4622
0.4706 0.53 1650 0.4611
0.4673 0.55 1700 0.4605
0.4539 0.56 1750 0.4577
0.4501 0.58 1800 0.4560
0.4638 0.59 1850 0.4542
0.4663 0.61 1900 0.4521
0.4638 0.63 1950 0.4509
0.4562 0.64 2000 0.4501
0.4535 0.66 2050 0.4496
0.4548 0.67 2100 0.4470
0.442 0.69 2150 0.4453
0.4543 0.71 2200 0.4449
0.4435 0.72 2250 0.4428
0.4633 0.74 2300 0.4418
0.4438 0.75 2350 0.4416
0.4443 0.77 2400 0.4392
0.4424 0.79 2450 0.4386
0.4341 0.8 2500 0.4367
0.4329 0.82 2550 0.4353
0.4356 0.83 2600 0.4349
0.4384 0.85 2650 0.4351
0.4327 0.87 2700 0.4321
0.4356 0.88 2750 0.4323
0.4428 0.9 2800 0.4310
0.4358 0.91 2850 0.4304
0.4322 0.93 2900 0.4293
0.4336 0.95 2950 0.4280
0.4296 0.96 3000 0.4269
0.4365 0.98 3050 0.4267
0.4313 0.99 3100 0.4250
0.4256 1.01 3150 0.4251
0.4258 1.03 3200 0.4241
0.4245 1.04 3250 0.4225
0.4161 1.06 3300 0.4223
0.4228 1.07 3350 0.4215
0.4194 1.09 3400 0.4205
0.4331 1.11 3450 0.4193
0.4246 1.12 3500 0.4191
0.4246 1.14 3550 0.4173
0.4229 1.16 3600 0.4175
0.4128 1.17 3650 0.4160
0.4189 1.19 3700 0.4156
0.4154 1.2 3750 0.4148
0.4263 1.22 3800 0.4140
0.4124 1.24 3850 0.4139
0.4201 1.25 3900 0.4132
0.4224 1.27 3950 0.4122
0.4114 1.28 4000 0.4122
0.4169 1.3 4050 0.4112
0.4167 1.32 4100 0.4107
0.4019 1.33 4150 0.4095
0.4142 1.35 4200 0.4087
0.4086 1.36 4250 0.4080
0.406 1.38 4300 0.4075
0.4091 1.4 4350 0.4069
0.4149 1.41 4400 0.4062
0.4078 1.43 4450 0.4054
0.3997 1.44 4500 0.4052
0.3985 1.46 4550 0.4040
0.4035 1.48 4600 0.4035
0.3982 1.49 4650 0.4025
0.4018 1.51 4700 0.4030
0.4078 1.52 4750 0.4021
0.3991 1.54 4800 0.4010
0.4033 1.56 4850 0.4003
0.3964 1.57 4900 0.4005
0.3965 1.59 4950 0.3993
0.407 1.6 5000 0.3994
0.4036 1.62 5050 0.3983
0.4063 1.64 5100 0.3980
0.3857 1.65 5150 0.3973
0.3949 1.67 5200 0.3973
0.3872 1.68 5250 0.3965
0.393 1.7 5300 0.3959
0.3891 1.72 5350 0.3955
0.3903 1.73 5400 0.3950
0.3941 1.75 5450 0.3947
0.3879 1.76 5500 0.3941
0.4016 1.78 5550 0.3937
0.3936 1.8 5600 0.3929
0.4005 1.81 5650 0.3932
0.3939 1.83 5700 0.3923
0.4032 1.85 5750 0.3921
0.3921 1.86 5800 0.3921
0.3903 1.88 5850 0.3905
0.3983 1.89 5900 0.3910
0.3806 1.91 5950 0.3897
0.3964 1.93 6000 0.3906
0.3866 1.94 6050 0.3890
0.3882 1.96 6100 0.3888
0.3835 1.97 6150 0.3885
0.3921 1.99 6200 0.3875
0.388 2.01 6250 0.3878
0.3829 2.02 6300 0.3872
0.3814 2.04 6350 0.3867
0.3818 2.05 6400 0.3862
0.3802 2.07 6450 0.3860
0.3739 2.09 6500 0.3853
0.3771 2.1 6550 0.3852
0.3732 2.12 6600 0.3846
0.385 2.13 6650 0.3849
0.3767 2.15 6700 0.3833
0.3802 2.17 6750 0.3836
0.3844 2.18 6800 0.3828
0.3761 2.2 6850 0.3826
0.3765 2.21 6900 0.3826
0.3787 2.23 6950 0.3825
0.378 2.25 7000 0.3815
0.3792 2.26 7050 0.3815
0.3908 2.28 7100 0.3811
0.3757 2.29 7150 0.3810
0.376 2.31 7200 0.3804
0.3785 2.33 7250 0.3805
0.3744 2.34 7300 0.3797
0.3984 2.36 7350 0.3791
0.3833 2.37 7400 0.3792
0.3808 2.39 7450 0.3785
0.3803 2.41 7500 0.3786
0.3828 2.42 7550 0.3778
0.3697 2.44 7600 0.3780
0.3692 2.45 7650 0.3763
0.3808 2.47 7700 0.3769
0.3764 2.49 7750 0.3763
0.3865 2.5 7800 0.3763
0.375 2.52 7850 0.3760
0.368 2.53 7900 0.3755
0.3632 2.55 7950 0.3757
0.3792 2.57 8000 0.3758
0.374 2.58 8050 0.3748
0.3689 2.6 8100 0.3741
0.3843 2.62 8150 0.3741
0.3669 2.63 8200 0.3739
0.368 2.65 8250 0.3732
0.3726 2.66 8300 0.3734
0.3653 2.68 8350 0.3728
0.3777 2.7 8400 0.3732
0.3625 2.71 8450 0.3724
0.3749 2.73 8500 0.3716
0.3708 2.74 8550 0.3725
0.3618 2.76 8600 0.3711
0.3659 2.78 8650 0.3714
0.3661 2.79 8700 0.3711
0.3771 2.81 8750 0.3714
0.3637 2.82 8800 0.3704
0.3768 2.84 8850 0.3700
0.3722 2.86 8900 0.3701
0.366 2.87 8950 0.3693
0.3716 2.89 9000 0.3690
0.3622 2.9 9050 0.3688
0.3594 2.92 9100 0.3682
0.368 2.94 9150 0.3680
0.3538 2.95 9200 0.3678
0.3578 2.97 9250 0.3676
0.3685 2.98 9300 0.3679
0.3631 3.0 9350 0.3674
0.3645 3.02 9400 0.3665
0.3654 3.03 9450 0.3671
0.3502 3.05 9500 0.3662
0.356 3.06 9550 0.3665
0.3642 3.08 9600 0.3662
0.3688 3.1 9650 0.3659
0.3514 3.11 9700 0.3655
0.3463 3.13 9750 0.3656
0.3517 3.14 9800 0.3651
0.3666 3.16 9850 0.3650
0.3617 3.18 9900 0.3660
0.3452 3.19 9950 0.3649
0.3591 3.21 10000 0.3647
0.3509 3.22 10050 0.3643
0.3618 3.24 10100 0.3641
0.3571 3.26 10150 0.3640
0.3587 3.27 10200 0.3633
0.3664 3.29 10250 0.3637
0.3502 3.3 10300 0.3633
0.373 3.32 10350 0.3626
0.3623 3.34 10400 0.3622
0.3554 3.35 10450 0.3624
0.3511 3.37 10500 0.3622
0.3534 3.39 10550 0.3626
0.3473 3.4 10600 0.3620
0.3563 3.42 10650 0.3618
0.3612 3.43 10700 0.3614
0.3587 3.45 10750 0.3610
0.3521 3.47 10800 0.3609
0.3443 3.48 10850 0.3610
0.3615 3.5 10900 0.3608
0.3589 3.51 10950 0.3609
0.364 3.53 11000 0.3598
0.3498 3.55 11050 0.3600
0.3541 3.56 11100 0.3597
0.3555 3.58 11150 0.3594
0.3491 3.59 11200 0.3596
0.3498 3.61 11250 0.3589
0.3484 3.63 11300 0.3590
0.3483 3.64 11350 0.3586
0.3533 3.66 11400 0.3580
0.3479 3.67 11450 0.3589
0.3539 3.69 11500 0.3580
0.3507 3.71 11550 0.3582
0.3534 3.72 11600 0.3579
0.3559 3.74 11650 0.3575
0.3477 3.75 11700 0.3577
0.3501 3.77 11750 0.3574
0.3491 3.79 11800 0.3569
0.3661 3.8 11850 0.3569
0.3455 3.82 11900 0.3568
0.3522 3.83 11950 0.3564
0.3532 3.85 12000 0.3562
0.3513 3.87 12050 0.3559
0.3527 3.88 12100 0.3561
0.3575 3.9 12150 0.3556
0.3403 3.92 12200 0.3550
0.3495 3.93 12250 0.3554
0.3514 3.95 12300 0.3548
0.3556 3.96 12350 0.3547
0.3549 3.98 12400 0.3545
0.3541 4.0 12450 0.3542
0.3477 4.01 12500 0.3551
0.3449 4.03 12550 0.3542
0.3426 4.04 12600 0.3552
0.3411 4.06 12650 0.3545
0.3476 4.08 12700 0.3540
0.3547 4.09 12750 0.3536
0.3529 4.11 12800 0.3537

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0