DandinPower
/

breeze_7b_lora_full_text

nycu-112-2-deeplearning-hw2

Generated from Trainer

Model card Files Files and versions Community

DandinPower commited on Apr 24, 2024

Commit

04defde

·

verified ·

1 Parent(s): 0f0a9e1

End of training

Files changed (1) hide show

README.md +98 -0

README.md ADDED Viewed

	@@ -0,0 +1,98 @@

+---
+language:
+- zh
+license: apache-2.0
+library_name: peft
+tags:
+- trl
+- sft
+- nycu-112-2-deeplearning-hw2
+- generated_from_trainer
+base_model: MediaTek-Research/Breeze-7B-Instruct-v1_0
+datasets:
+- DandinPower/ZH-Reading-Comprehension
+model-index:
+- name: breeze_7b_lora
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# breeze_7b_lora
+This model is a fine-tuned version of [MediaTek-Research/Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0) on the DandinPower/ZH-Reading-Comprehension dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.3504
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 2
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 8
+- total_eval_batch_size: 2
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 500
+- num_epochs: 5.0
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 2.6567        | 0.1845 | 250  | 2.6359          |
+| 2.5304        | 0.3690 | 500  | 2.5482          |
+| 2.4385        | 0.5535 | 750  | 2.4359          |
+| 2.3947        | 0.7380 | 1000 | 2.3351          |
+| 2.2359        | 0.9225 | 1250 | 2.2414          |
+| 1.9919        | 1.1070 | 1500 | 2.1528          |
+| 1.9533        | 1.2915 | 1750 | 2.0739          |
+| 1.8919        | 1.4760 | 2000 | 1.9973          |
+| 1.8247        | 1.6605 | 2250 | 1.9203          |
+| 1.6582        | 1.8450 | 2500 | 1.8425          |
+| 1.4947        | 2.0295 | 2750 | 1.7883          |
+| 1.4298        | 2.2140 | 3000 | 1.7411          |
+| 1.4936        | 2.3985 | 3250 | 1.6912          |
+| 1.3752        | 2.5830 | 3500 | 1.6467          |
+| 1.3758        | 2.7675 | 3750 | 1.5994          |
+| 1.2897        | 2.9520 | 4000 | 1.5617          |
+| 1.0563        | 3.1365 | 4250 | 1.5384          |
+| 1.0315        | 3.3210 | 4500 | 1.5103          |
+| 1.0657        | 3.5055 | 4750 | 1.4766          |
+| 1.0247        | 3.6900 | 5000 | 1.4505          |
+| 1.0058        | 3.8745 | 5250 | 1.4253          |
+| 0.8809        | 4.0590 | 5500 | 1.4120          |
+| 0.8298        | 4.2435 | 5750 | 1.3935          |
+| 0.9152        | 4.4280 | 6000 | 1.3781          |
+| 0.8512        | 4.6125 | 6250 | 1.3650          |
+| 0.9111        | 4.7970 | 6500 | 1.3536          |
+| 0.8168        | 4.9815 | 6750 | 1.3504          |
+### Framework versions
+- PEFT 0.10.0
+- Transformers 4.40.0
+- Pytorch 2.2.2+cu121
+- Datasets 2.19.0
+- Tokenizers 0.19.1