shkang's picture
Model save
c9eea4f
|
raw
history blame
3.97 kB
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
  - generated_from_trainer
model-index:
  - name: zephyr-7b-sft-lora-accum4-lr5e_5
    results: []

zephyr-7b-sft-lora-accum4-lr5e_5

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5833

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Validation Loss
1.8243 0.55 13 1.6684
1.5291 1.57 27 1.4003
1.2355 2.55 40 1.1808
1.1393 3.57 54 1.0949
1.0659 4.55 67 1.0457
1.0196 5.57 81 1.0065
0.9831 6.55 94 0.9686
0.9281 7.57 108 0.9255
0.8678 8.55 121 0.8814
0.8054 9.57 135 0.8275
0.7683 10.55 148 0.7861
0.6906 11.57 162 0.7272
0.6246 12.55 175 0.6795
0.5813 13.57 189 0.6364
0.5253 14.55 202 0.6078
0.5149 15.57 216 0.5811
0.4949 16.55 229 0.5605
0.4644 17.57 243 0.5462
0.458 18.55 256 0.5346
0.4294 19.57 270 0.5202
0.4143 20.55 283 0.5177
0.4161 21.57 297 0.5108
0.4128 22.55 310 0.5057
0.4055 23.57 324 0.5071
0.3937 24.55 337 0.5058
0.3967 25.57 351 0.5017
0.3754 26.55 364 0.4998
0.3742 27.57 378 0.5019
0.3756 28.55 391 0.5019
0.3652 29.57 405 0.5061
0.3597 30.55 418 0.5076
0.3609 31.57 432 0.5079
0.3581 32.55 445 0.5108
0.3426 33.57 459 0.5117
0.3481 34.55 472 0.5141
0.3435 35.57 486 0.5150
0.3317 36.55 499 0.5245
0.3387 37.57 513 0.5239
0.332 38.55 526 0.5319
0.3334 39.57 540 0.5342
0.323 40.55 553 0.5388
0.3144 41.57 567 0.5423
0.3092 42.55 580 0.5465
0.3084 43.57 594 0.5481
0.3091 44.55 607 0.5605
0.3044 45.57 621 0.5606
0.303 46.55 634 0.5683
0.2896 47.57 648 0.5722
0.2854 48.55 661 0.5778
0.291 49.57 675 0.5826

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.0
  • Datasets 2.14.6
  • Tokenizers 0.14.1