tyzhu's picture
Model save
d2019d8 verified
metadata
license: other
base_model: Qwen/Qwen1.5-4B
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: squad_qa_baseline_v5_full_Qwen_Qwen1.5-4B_3e-5_lora
    results: []
library_name: peft

squad_qa_baseline_v5_full_Qwen_Qwen1.5-4B_3e-5_lora

This model is a fine-tuned version of Qwen/Qwen1.5-4B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.8632
  • Accuracy: 0.5660

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 0.9916 74 2.0550 0.5952
2.3403 1.9966 149 2.0411 0.5933
2.0198 2.9883 223 2.0403 0.5932
2.0198 3.9933 298 2.0647 0.5922
1.9239 4.9983 373 2.0999 0.5921
1.7309 5.9899 447 2.1973 0.5879
1.5254 6.9950 522 2.2753 0.5861
1.5254 8.0 597 2.4079 0.5819
1.2937 8.9916 671 2.5096 0.5775
1.0409 9.9966 746 2.6079 0.5739
0.8766 10.9883 820 2.7579 0.5718
0.8766 11.9933 895 2.8722 0.5688
0.721 12.9983 970 2.9797 0.5672
0.6011 13.9899 1044 3.0708 0.5662
0.5455 14.9950 1119 3.1660 0.5648
0.5455 16.0 1194 3.2479 0.5650
0.5003 16.9916 1268 3.2445 0.5655
0.4683 17.9966 1343 3.2800 0.5638
0.457 18.9883 1417 3.4280 0.5640
0.457 19.9933 1492 3.4113 0.5662
0.4441 20.9983 1567 3.4731 0.5637
0.4327 21.9899 1641 3.5407 0.5639
0.4308 22.9950 1716 3.4811 0.5640
0.4308 24.0 1791 3.5854 0.5642
0.4245 24.9916 1865 3.5206 0.5640
0.416 25.9966 1940 3.6091 0.5638
0.4173 26.9883 2014 3.5707 0.5643
0.4173 27.9933 2089 3.6671 0.5648
0.4117 28.9983 2164 3.6267 0.5631
0.409 29.9899 2238 3.6658 0.5604
0.4085 30.9950 2313 3.6984 0.5621
0.4085 32.0 2388 3.6584 0.5660
0.403 32.9916 2462 3.5848 0.5626
0.404 33.9966 2537 3.6365 0.5631
0.4013 34.9883 2611 3.7047 0.5647
0.4013 35.9933 2686 3.7735 0.5643
0.3987 36.9983 2761 3.6867 0.5657
0.3951 37.9899 2835 3.7349 0.5662
0.3971 38.9950 2910 3.7173 0.5643
0.3971 40.0 2985 3.8004 0.5643
0.3939 40.9916 3059 3.8041 0.5636
0.3912 41.9966 3134 3.8263 0.5648
0.3941 42.9883 3208 3.7954 0.5646
0.3941 43.9933 3283 3.8001 0.5637
0.3878 44.9983 3358 3.8438 0.5634
0.3879 45.9899 3432 3.8626 0.5631
0.3907 46.9950 3507 3.7882 0.5645
0.3907 48.0 3582 3.8001 0.5622
0.3864 48.9916 3656 3.7201 0.5609
0.3871 49.5812 3700 3.8632 0.5660

Framework versions

  • PEFT 0.5.0
  • Transformers 4.40.2
  • Pytorch 2.3.0
  • Datasets 2.19.1
  • Tokenizers 0.19.1