squad_qa_title_v5_full_recite_full_passage_Qwen_Qwen1.5-4B_3e-5_lora
This model is a fine-tuned version of Qwen/Qwen1.5-4B on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.4050
- Accuracy: 0.8665
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 50.0
Training results
Training Loss | Epoch | Step | Accuracy | Validation Loss |
---|---|---|---|---|
1.8123 | 0.9968 | 158 | 0.7011 | 1.7570 |
1.2776 | 1.9968 | 316 | 1.3745 | 0.7363 |
0.8415 | 3.0 | 475 | 0.8937 | 0.7894 |
0.3351 | 3.9968 | 633 | 0.5359 | 0.8332 |
0.2298 | 5.0 | 792 | 0.3779 | 0.8515 |
0.1531 | 5.9968 | 950 | 0.3157 | 0.8593 |
0.1313 | 7.0 | 1109 | 0.2955 | 0.8622 |
0.126 | 7.9968 | 1267 | 0.2862 | 0.8650 |
0.1219 | 9.0 | 1426 | 0.2900 | 0.8646 |
0.1181 | 9.9968 | 1584 | 0.2740 | 0.8658 |
0.1096 | 11.0 | 1743 | 0.2803 | 0.8675 |
0.1063 | 11.9968 | 1901 | 0.2888 | 0.8655 |
0.1007 | 13.0 | 2060 | 0.2885 | 0.8655 |
0.0969 | 13.9968 | 2218 | 0.2904 | 0.8659 |
0.0898 | 15.0 | 2377 | 0.2931 | 0.8661 |
0.083 | 15.9968 | 2535 | 0.3117 | 0.8655 |
0.0821 | 17.0 | 2694 | 0.3187 | 0.8672 |
0.073 | 17.9968 | 2852 | 0.3261 | 0.8653 |
0.0717 | 19.0 | 3011 | 0.3332 | 0.8653 |
0.0676 | 19.9968 | 3169 | 0.3367 | 0.8658 |
0.0643 | 21.0 | 3328 | 0.3405 | 0.8659 |
0.0617 | 21.9968 | 3486 | 0.3636 | 0.8654 |
0.0601 | 23.0 | 3645 | 0.3590 | 0.8652 |
0.0607 | 23.9968 | 3803 | 0.3677 | 0.8676 |
0.0576 | 25.0 | 3962 | 0.3717 | 0.8654 |
0.0566 | 25.9968 | 4120 | 0.3843 | 0.8655 |
0.0555 | 27.0 | 4279 | 0.3766 | 0.8654 |
0.0549 | 27.9968 | 4437 | 0.3807 | 0.8659 |
0.054 | 29.0 | 4596 | 0.3793 | 0.8661 |
0.0535 | 29.9968 | 4754 | 0.3807 | 0.8660 |
0.0547 | 31.0 | 4913 | 0.3939 | 0.8653 |
0.056 | 31.9968 | 5071 | 0.3888 | 0.8655 |
0.0558 | 33.0 | 5230 | 0.3977 | 0.8656 |
0.0538 | 33.9968 | 5388 | 0.3771 | 0.8662 |
0.0526 | 35.0 | 5547 | 0.3883 | 0.8661 |
0.0524 | 35.9968 | 5705 | 0.4030 | 0.8660 |
0.0509 | 37.0 | 5864 | 0.3947 | 0.8663 |
0.0513 | 37.9968 | 6022 | 0.4077 | 0.8662 |
0.0503 | 39.0 | 6181 | 0.3936 | 0.8662 |
0.0513 | 39.9968 | 6339 | 0.4060 | 0.8659 |
0.052 | 41.0 | 6498 | 0.4026 | 0.8638 |
0.0562 | 41.9968 | 6656 | 0.3967 | 0.8656 |
0.053 | 43.0 | 6815 | 0.3989 | 0.8657 |
0.0508 | 43.9968 | 6973 | 0.3921 | 0.8665 |
0.0505 | 45.0 | 7132 | 0.3983 | 0.8662 |
0.0507 | 45.9968 | 7290 | 0.3915 | 0.8665 |
0.0502 | 47.0 | 7449 | 0.3978 | 0.8668 |
0.0502 | 47.9968 | 7607 | 0.4000 | 0.8665 |
0.0494 | 49.0 | 7766 | 0.4022 | 0.8666 |
0.0505 | 49.8454 | 7900 | 0.4050 | 0.8665 |
Framework versions
- PEFT 0.5.0
- Transformers 4.40.2
- Pytorch 2.3.0
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for tyzhu/squad_qa_title_v5_full_recite_full_passage_Qwen_Qwen1.5-4B_3e-5_lora
Base model
Qwen/Qwen1.5-4B