lmind_hotpot_train8000_eval7405_v1_qa_3e-4_lora2
This model is a fine-tuned version of Qwen/Qwen1.5-4B on the tyzhu/lmind_hotpot_train8000_eval7405_v1_qa dataset. It achieves the following results on the evaluation set:
- Loss: 3.8970
- Accuracy: 0.4845
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 50.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
2.2487 | 1.0 | 250 | 2.3118 | 0.5168 |
1.9113 | 2.0 | 500 | 2.3749 | 0.5154 |
1.4924 | 3.0 | 750 | 2.5290 | 0.5099 |
1.1157 | 4.0 | 1000 | 2.7949 | 0.5038 |
0.7814 | 5.0 | 1250 | 3.0137 | 0.4985 |
0.605 | 6.0 | 1500 | 3.2205 | 0.4966 |
0.4968 | 7.0 | 1750 | 3.3760 | 0.4947 |
0.4597 | 8.0 | 2000 | 3.4634 | 0.4923 |
0.4242 | 9.0 | 2250 | 3.4379 | 0.4956 |
0.4223 | 10.0 | 2500 | 3.6001 | 0.4950 |
0.4 | 11.0 | 2750 | 3.5718 | 0.4956 |
0.4007 | 12.0 | 3000 | 3.5684 | 0.4932 |
0.3929 | 13.0 | 3250 | 3.6029 | 0.4931 |
0.4003 | 14.0 | 3500 | 3.5841 | 0.4921 |
0.3834 | 15.0 | 3750 | 3.6553 | 0.4925 |
0.3955 | 16.0 | 4000 | 3.6385 | 0.4920 |
0.3843 | 17.0 | 4250 | 3.6584 | 0.4908 |
0.3916 | 18.0 | 4500 | 3.6592 | 0.4925 |
0.3825 | 19.0 | 4750 | 3.6604 | 0.4913 |
0.387 | 20.0 | 5000 | 3.7399 | 0.4904 |
0.3832 | 21.0 | 5250 | 3.6564 | 0.4914 |
0.384 | 22.0 | 5500 | 3.6862 | 0.4899 |
0.3753 | 23.0 | 5750 | 3.6691 | 0.4906 |
0.3816 | 24.0 | 6000 | 3.7181 | 0.4909 |
0.3711 | 25.0 | 6250 | 3.7159 | 0.4896 |
0.3758 | 26.0 | 6500 | 3.7266 | 0.4907 |
0.372 | 27.0 | 6750 | 3.7633 | 0.4916 |
0.3813 | 28.0 | 7000 | 3.7511 | 0.4893 |
0.3699 | 29.0 | 7250 | 3.7291 | 0.4903 |
0.3772 | 30.0 | 7500 | 3.7827 | 0.4886 |
0.3673 | 31.0 | 7750 | 3.8032 | 0.4892 |
0.3708 | 32.0 | 8000 | 3.8303 | 0.4895 |
0.3632 | 33.0 | 8250 | 3.8218 | 0.4887 |
0.3692 | 34.0 | 8500 | 3.7488 | 0.49 |
0.3678 | 35.0 | 8750 | 3.8524 | 0.4869 |
0.3762 | 36.0 | 9000 | 3.8221 | 0.4875 |
0.3702 | 37.0 | 9250 | 3.8083 | 0.4862 |
0.3745 | 38.0 | 9500 | 3.8329 | 0.4860 |
0.3611 | 39.0 | 9750 | 3.8969 | 0.4878 |
0.3648 | 40.0 | 10000 | 3.8497 | 0.4869 |
0.3616 | 41.0 | 10250 | 3.8461 | 0.4865 |
0.3659 | 42.0 | 10500 | 3.8722 | 0.4877 |
0.3585 | 43.0 | 10750 | 3.8763 | 0.4874 |
0.3628 | 44.0 | 11000 | 3.8507 | 0.4877 |
0.3616 | 45.0 | 11250 | 3.8788 | 0.4876 |
0.367 | 46.0 | 11500 | 3.8688 | 0.4875 |
0.3629 | 47.0 | 11750 | 3.9210 | 0.4868 |
0.366 | 48.0 | 12000 | 3.9305 | 0.4861 |
0.3608 | 49.0 | 12250 | 3.9263 | 0.4875 |
0.362 | 50.0 | 12500 | 3.8970 | 0.4845 |
Framework versions
- PEFT 0.5.0
- Transformers 4.41.1
- Pytorch 2.1.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 34
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model’s pipeline type.
Model tree for tyzhu/lmind_hotpot_train8000_eval7405_v1_qa_3e-4_lora2
Base model
Qwen/Qwen1.5-4BDataset used to train tyzhu/lmind_hotpot_train8000_eval7405_v1_qa_3e-4_lora2
Evaluation results
- Accuracy on tyzhu/lmind_hotpot_train8000_eval7405_v1_qaself-reported0.485