mistral-rand-300k-newinst
This model is a fine-tuned version of TheBloke/Mistral-7B-v0.1-GPTQ on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3214
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 15
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.5581 | 0.01 | 50 | 0.7223 |
0.6511 | 0.02 | 100 | 0.6142 |
0.5881 | 0.02 | 150 | 0.5763 |
0.5705 | 0.03 | 200 | 0.5529 |
0.5534 | 0.04 | 250 | 0.5324 |
0.5168 | 0.05 | 300 | 0.5190 |
0.5185 | 0.06 | 350 | 0.5079 |
0.4996 | 0.06 | 400 | 0.4984 |
0.5031 | 0.07 | 450 | 0.4911 |
0.4797 | 0.08 | 500 | 0.4840 |
0.4844 | 0.09 | 550 | 0.4766 |
0.4789 | 0.1 | 600 | 0.4713 |
0.467 | 0.1 | 650 | 0.4653 |
0.461 | 0.11 | 700 | 0.4604 |
0.4562 | 0.12 | 750 | 0.4547 |
0.4521 | 0.13 | 800 | 0.4479 |
0.4298 | 0.14 | 850 | 0.4395 |
0.4405 | 0.14 | 900 | 0.4316 |
0.4265 | 0.15 | 950 | 0.4257 |
0.4335 | 0.16 | 1000 | 0.4212 |
0.4054 | 0.17 | 1050 | 0.4179 |
0.4285 | 0.18 | 1100 | 0.4164 |
0.4235 | 0.18 | 1150 | 0.4136 |
0.4047 | 0.19 | 1200 | 0.4117 |
0.416 | 0.2 | 1250 | 0.4090 |
0.4194 | 0.21 | 1300 | 0.4071 |
0.4164 | 0.22 | 1350 | 0.4060 |
0.4069 | 0.22 | 1400 | 0.4035 |
0.4049 | 0.23 | 1450 | 0.4028 |
0.4 | 0.24 | 1500 | 0.4002 |
0.4085 | 0.25 | 1550 | 0.3973 |
0.4093 | 0.26 | 1600 | 0.3972 |
0.3875 | 0.26 | 1650 | 0.3944 |
0.403 | 0.27 | 1700 | 0.3941 |
0.399 | 0.28 | 1750 | 0.3919 |
0.3938 | 0.29 | 1800 | 0.3914 |
0.3958 | 0.3 | 1850 | 0.3891 |
0.3991 | 0.3 | 1900 | 0.3876 |
0.3956 | 0.31 | 1950 | 0.3863 |
0.4004 | 0.32 | 2000 | 0.3853 |
0.398 | 0.33 | 2050 | 0.3845 |
0.3998 | 0.34 | 2100 | 0.3832 |
0.3741 | 0.34 | 2150 | 0.3819 |
0.382 | 0.35 | 2200 | 0.3808 |
0.3839 | 0.36 | 2250 | 0.3799 |
0.3797 | 0.37 | 2300 | 0.3789 |
0.3783 | 0.38 | 2350 | 0.3784 |
0.3809 | 0.39 | 2400 | 0.3778 |
0.3976 | 0.39 | 2450 | 0.3757 |
0.3877 | 0.4 | 2500 | 0.3753 |
0.3814 | 0.41 | 2550 | 0.3746 |
0.3631 | 0.42 | 2600 | 0.3734 |
0.3803 | 0.43 | 2650 | 0.3726 |
0.3791 | 0.43 | 2700 | 0.3720 |
0.3733 | 0.44 | 2750 | 0.3711 |
0.3726 | 0.45 | 2800 | 0.3705 |
0.3778 | 0.46 | 2850 | 0.3687 |
0.378 | 0.47 | 2900 | 0.3684 |
0.3769 | 0.47 | 2950 | 0.3674 |
0.3712 | 0.48 | 3000 | 0.3670 |
0.3629 | 0.49 | 3050 | 0.3668 |
0.3714 | 0.5 | 3100 | 0.3653 |
0.3743 | 0.51 | 3150 | 0.3639 |
0.3631 | 0.51 | 3200 | 0.3637 |
0.3805 | 0.52 | 3250 | 0.3628 |
0.3577 | 0.53 | 3300 | 0.3626 |
0.373 | 0.54 | 3350 | 0.3628 |
0.3609 | 0.55 | 3400 | 0.3608 |
0.358 | 0.55 | 3450 | 0.3604 |
0.3556 | 0.56 | 3500 | 0.3596 |
0.3442 | 0.57 | 3550 | 0.3603 |
0.3619 | 0.58 | 3600 | 0.3590 |
0.3691 | 0.59 | 3650 | 0.3573 |
0.3614 | 0.59 | 3700 | 0.3577 |
0.3661 | 0.6 | 3750 | 0.3558 |
0.3667 | 0.61 | 3800 | 0.3561 |
0.3653 | 0.62 | 3850 | 0.3554 |
0.3645 | 0.63 | 3900 | 0.3547 |
0.3496 | 0.63 | 3950 | 0.3545 |
0.3689 | 0.64 | 4000 | 0.3539 |
0.3554 | 0.65 | 4050 | 0.3531 |
0.3567 | 0.66 | 4100 | 0.3520 |
0.361 | 0.67 | 4150 | 0.3519 |
0.3522 | 0.67 | 4200 | 0.3514 |
0.347 | 0.68 | 4250 | 0.3507 |
0.3481 | 0.69 | 4300 | 0.3504 |
0.3646 | 0.7 | 4350 | 0.3497 |
0.3524 | 0.71 | 4400 | 0.3501 |
0.3487 | 0.71 | 4450 | 0.3492 |
0.3496 | 0.72 | 4500 | 0.3482 |
0.3691 | 0.73 | 4550 | 0.3481 |
0.36 | 0.74 | 4600 | 0.3484 |
0.3485 | 0.75 | 4650 | 0.3473 |
0.3492 | 0.75 | 4700 | 0.3471 |
0.3505 | 0.76 | 4750 | 0.3458 |
0.3472 | 0.77 | 4800 | 0.3466 |
0.3438 | 0.78 | 4850 | 0.3449 |
0.3516 | 0.79 | 4900 | 0.3447 |
0.3388 | 0.79 | 4950 | 0.3440 |
0.3443 | 0.8 | 5000 | 0.3433 |
0.3465 | 0.81 | 5050 | 0.3439 |
0.3335 | 0.82 | 5100 | 0.3421 |
0.3421 | 0.83 | 5150 | 0.3419 |
0.3424 | 0.83 | 5200 | 0.3418 |
0.338 | 0.84 | 5250 | 0.3411 |
0.3507 | 0.85 | 5300 | 0.3413 |
0.3347 | 0.86 | 5350 | 0.3400 |
0.3449 | 0.87 | 5400 | 0.3402 |
0.3396 | 0.87 | 5450 | 0.3404 |
0.3461 | 0.88 | 5500 | 0.3404 |
0.3519 | 0.89 | 5550 | 0.3394 |
0.3458 | 0.9 | 5600 | 0.3384 |
0.344 | 0.91 | 5650 | 0.3389 |
0.3415 | 0.91 | 5700 | 0.3386 |
0.3444 | 0.92 | 5750 | 0.3381 |
0.3366 | 0.93 | 5800 | 0.3377 |
0.3472 | 0.94 | 5850 | 0.3366 |
0.3335 | 0.95 | 5900 | 0.3363 |
0.3362 | 0.95 | 5950 | 0.3358 |
0.3408 | 0.96 | 6000 | 0.3354 |
0.353 | 0.97 | 6050 | 0.3355 |
0.3333 | 0.98 | 6100 | 0.3352 |
0.3356 | 0.99 | 6150 | 0.3341 |
0.3418 | 0.99 | 6200 | 0.3343 |
0.332 | 1.0 | 6250 | 0.3339 |
0.3359 | 1.01 | 6300 | 0.3341 |
0.3316 | 1.02 | 6350 | 0.3337 |
0.3356 | 1.03 | 6400 | 0.3324 |
0.3322 | 1.03 | 6450 | 0.3328 |
0.3319 | 1.04 | 6500 | 0.3317 |
0.3275 | 1.05 | 6550 | 0.3315 |
0.3245 | 1.06 | 6600 | 0.3316 |
0.3372 | 1.07 | 6650 | 0.3312 |
0.326 | 1.07 | 6700 | 0.3311 |
0.3246 | 1.08 | 6750 | 0.3311 |
0.3333 | 1.09 | 6800 | 0.3298 |
0.3321 | 1.1 | 6850 | 0.3292 |
0.3467 | 1.11 | 6900 | 0.3293 |
0.333 | 1.12 | 6950 | 0.3297 |
0.3328 | 1.12 | 7000 | 0.3296 |
0.3309 | 1.13 | 7050 | 0.3290 |
0.3338 | 1.14 | 7100 | 0.3284 |
0.3267 | 1.15 | 7150 | 0.3281 |
0.3342 | 1.16 | 7200 | 0.3273 |
0.321 | 1.16 | 7250 | 0.3277 |
0.3258 | 1.17 | 7300 | 0.3273 |
0.3263 | 1.18 | 7350 | 0.3277 |
0.3321 | 1.19 | 7400 | 0.3269 |
0.325 | 1.2 | 7450 | 0.3268 |
0.3261 | 1.2 | 7500 | 0.3262 |
0.3337 | 1.21 | 7550 | 0.3257 |
0.3353 | 1.22 | 7600 | 0.3254 |
0.3089 | 1.23 | 7650 | 0.3250 |
0.3388 | 1.24 | 7700 | 0.3250 |
0.3266 | 1.24 | 7750 | 0.3244 |
0.3316 | 1.25 | 7800 | 0.3243 |
0.3192 | 1.26 | 7850 | 0.3245 |
0.3444 | 1.27 | 7900 | 0.3239 |
0.3212 | 1.28 | 7950 | 0.3248 |
0.3237 | 1.28 | 8000 | 0.3237 |
0.3297 | 1.29 | 8050 | 0.3230 |
0.3252 | 1.3 | 8100 | 0.3231 |
0.3211 | 1.31 | 8150 | 0.3228 |
0.3323 | 1.32 | 8200 | 0.3238 |
0.3127 | 1.32 | 8250 | 0.3220 |
0.3163 | 1.33 | 8300 | 0.3223 |
0.322 | 1.34 | 8350 | 0.3211 |
0.3288 | 1.35 | 8400 | 0.3213 |
0.3248 | 1.36 | 8450 | 0.3214 |
Framework versions
- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0
- Downloads last month
- 2
Model tree for megha-shroff/mistral-rand-300k-newinst
Base model
mistralai/Mistral-7B-v0.1
Quantized
TheBloke/Mistral-7B-v0.1-GPTQ