End of training

ddd91e5 verified over 1 year ago

4.8 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: teknium/OpenHermes-2.5-Mistral-7B
	model-index:
	- name: paper_imp_new_format_1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# paper_imp_new_format_1

	This model is a fine-tuned version of [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6201

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: reduce_lr_on_plateau
	- lr_scheduler_warmup_steps: 200
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 0.99 \| 0.15 \| 4 \| 1.0009 \|
	\| 1.0287 \| 0.3 \| 8 \| 0.9440 \|
	\| 0.9432 \| 0.44 \| 12 \| 0.8927 \|
	\| 0.829 \| 0.59 \| 16 \| 0.8395 \|
	\| 0.7702 \| 0.74 \| 20 \| 0.7773 \|
	\| 0.6886 \| 0.89 \| 24 \| 0.7180 \|
	\| 0.5844 \| 1.04 \| 28 \| 0.6774 \|
	\| 0.6016 \| 1.19 \| 32 \| 0.6593 \|
	\| 0.5237 \| 1.33 \| 36 \| 0.6484 \|
	\| 0.5505 \| 1.48 \| 40 \| 0.6418 \|
	\| 0.669 \| 1.63 \| 44 \| 0.6350 \|
	\| 0.4595 \| 1.78 \| 48 \| 0.6320 \|
	\| 0.5486 \| 1.93 \| 52 \| 0.6290 \|
	\| 0.539 \| 2.07 \| 56 \| 0.6242 \|
	\| 0.5089 \| 2.22 \| 60 \| 0.6188 \|
	\| 0.6012 \| 2.37 \| 64 \| 0.6195 \|
	\| 0.4286 \| 2.52 \| 68 \| 0.6147 \|
	\| 0.4335 \| 2.67 \| 72 \| 0.6154 \|
	\| 0.4584 \| 2.81 \| 76 \| 0.6154 \|
	\| 0.4597 \| 2.96 \| 80 \| 0.6115 \|
	\| 0.421 \| 3.11 \| 84 \| 0.6076 \|
	\| 0.4541 \| 3.26 \| 88 \| 0.6131 \|
	\| 0.3533 \| 3.41 \| 92 \| 0.6199 \|
	\| 0.4007 \| 3.56 \| 96 \| 0.6142 \|
	\| 0.3876 \| 3.7 \| 100 \| 0.6104 \|
	\| 0.4258 \| 3.85 \| 104 \| 0.6073 \|
	\| 0.4537 \| 4.0 \| 108 \| 0.6059 \|
	\| 0.3486 \| 4.15 \| 112 \| 0.6098 \|
	\| 0.3399 \| 4.3 \| 116 \| 0.6172 \|
	\| 0.3826 \| 4.44 \| 120 \| 0.6215 \|
	\| 0.3813 \| 4.59 \| 124 \| 0.6210 \|
	\| 0.3835 \| 4.74 \| 128 \| 0.6192 \|
	\| 0.3864 \| 4.89 \| 132 \| 0.6173 \|
	\| 0.3216 \| 5.04 \| 136 \| 0.6147 \|
	\| 0.3923 \| 5.19 \| 140 \| 0.6151 \|
	\| 0.3757 \| 5.33 \| 144 \| 0.6158 \|
	\| 0.2861 \| 5.48 \| 148 \| 0.6164 \|
	\| 0.3838 \| 5.63 \| 152 \| 0.6174 \|
	\| 0.3669 \| 5.78 \| 156 \| 0.6180 \|
	\| 0.3405 \| 5.93 \| 160 \| 0.6188 \|
	\| 0.3355 \| 6.07 \| 164 \| 0.6188 \|
	\| 0.3533 \| 6.22 \| 168 \| 0.6193 \|
	\| 0.4245 \| 6.37 \| 172 \| 0.6191 \|
	\| 0.3257 \| 6.52 \| 176 \| 0.6196 \|
	\| 0.3361 \| 6.67 \| 180 \| 0.6198 \|
	\| 0.3595 \| 6.81 \| 184 \| 0.6198 \|
	\| 0.3075 \| 6.96 \| 188 \| 0.6198 \|
	\| 0.4133 \| 7.11 \| 192 \| 0.6199 \|
	\| 0.3341 \| 7.26 \| 196 \| 0.6197 \|
	\| 0.385 \| 7.41 \| 200 \| 0.6197 \|
	\| 0.3254 \| 7.56 \| 204 \| 0.6198 \|
	\| 0.385 \| 7.7 \| 208 \| 0.6199 \|
	\| 0.3332 \| 7.85 \| 212 \| 0.6197 \|
	\| 0.2648 \| 8.0 \| 216 \| 0.6198 \|
	\| 0.3602 \| 8.15 \| 220 \| 0.6203 \|
	\| 0.4129 \| 8.3 \| 224 \| 0.6200 \|
	\| 0.3716 \| 8.44 \| 228 \| 0.6196 \|
	\| 0.3152 \| 8.59 \| 232 \| 0.6200 \|
	\| 0.3463 \| 8.74 \| 236 \| 0.6199 \|
	\| 0.3357 \| 8.89 \| 240 \| 0.6198 \|
	\| 0.3029 \| 9.04 \| 244 \| 0.6199 \|
	\| 0.3679 \| 9.19 \| 248 \| 0.6200 \|
	\| 0.3911 \| 9.33 \| 252 \| 0.6201 \|
	\| 0.2844 \| 9.48 \| 256 \| 0.6199 \|
	\| 0.2584 \| 9.63 \| 260 \| 0.6199 \|
	\| 0.3966 \| 9.78 \| 264 \| 0.6200 \|
	\| 0.3423 \| 9.93 \| 268 \| 0.6201 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 2.1.2+cu121
	- Datasets 2.14.6
	- Tokenizers 0.15.0

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: teknium/OpenHermes-2.5-Mistral-7B
	model-index:
	- name: paper_imp_new_format_1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# paper_imp_new_format_1

	This model is a fine-tuned version of [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6201

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: reduce_lr_on_plateau
	- lr_scheduler_warmup_steps: 200
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 0.99 \| 0.15 \| 4 \| 1.0009 \|
	\| 1.0287 \| 0.3 \| 8 \| 0.9440 \|
	\| 0.9432 \| 0.44 \| 12 \| 0.8927 \|
	\| 0.829 \| 0.59 \| 16 \| 0.8395 \|
	\| 0.7702 \| 0.74 \| 20 \| 0.7773 \|
	\| 0.6886 \| 0.89 \| 24 \| 0.7180 \|
	\| 0.5844 \| 1.04 \| 28 \| 0.6774 \|
	\| 0.6016 \| 1.19 \| 32 \| 0.6593 \|
	\| 0.5237 \| 1.33 \| 36 \| 0.6484 \|
	\| 0.5505 \| 1.48 \| 40 \| 0.6418 \|
	\| 0.669 \| 1.63 \| 44 \| 0.6350 \|
	\| 0.4595 \| 1.78 \| 48 \| 0.6320 \|
	\| 0.5486 \| 1.93 \| 52 \| 0.6290 \|
	\| 0.539 \| 2.07 \| 56 \| 0.6242 \|
	\| 0.5089 \| 2.22 \| 60 \| 0.6188 \|
	\| 0.6012 \| 2.37 \| 64 \| 0.6195 \|
	\| 0.4286 \| 2.52 \| 68 \| 0.6147 \|
	\| 0.4335 \| 2.67 \| 72 \| 0.6154 \|
	\| 0.4584 \| 2.81 \| 76 \| 0.6154 \|
	\| 0.4597 \| 2.96 \| 80 \| 0.6115 \|
	\| 0.421 \| 3.11 \| 84 \| 0.6076 \|
	\| 0.4541 \| 3.26 \| 88 \| 0.6131 \|
	\| 0.3533 \| 3.41 \| 92 \| 0.6199 \|
	\| 0.4007 \| 3.56 \| 96 \| 0.6142 \|
	\| 0.3876 \| 3.7 \| 100 \| 0.6104 \|
	\| 0.4258 \| 3.85 \| 104 \| 0.6073 \|
	\| 0.4537 \| 4.0 \| 108 \| 0.6059 \|
	\| 0.3486 \| 4.15 \| 112 \| 0.6098 \|
	\| 0.3399 \| 4.3 \| 116 \| 0.6172 \|
	\| 0.3826 \| 4.44 \| 120 \| 0.6215 \|
	\| 0.3813 \| 4.59 \| 124 \| 0.6210 \|
	\| 0.3835 \| 4.74 \| 128 \| 0.6192 \|
	\| 0.3864 \| 4.89 \| 132 \| 0.6173 \|
	\| 0.3216 \| 5.04 \| 136 \| 0.6147 \|
	\| 0.3923 \| 5.19 \| 140 \| 0.6151 \|
	\| 0.3757 \| 5.33 \| 144 \| 0.6158 \|
	\| 0.2861 \| 5.48 \| 148 \| 0.6164 \|
	\| 0.3838 \| 5.63 \| 152 \| 0.6174 \|
	\| 0.3669 \| 5.78 \| 156 \| 0.6180 \|
	\| 0.3405 \| 5.93 \| 160 \| 0.6188 \|
	\| 0.3355 \| 6.07 \| 164 \| 0.6188 \|
	\| 0.3533 \| 6.22 \| 168 \| 0.6193 \|
	\| 0.4245 \| 6.37 \| 172 \| 0.6191 \|
	\| 0.3257 \| 6.52 \| 176 \| 0.6196 \|
	\| 0.3361 \| 6.67 \| 180 \| 0.6198 \|
	\| 0.3595 \| 6.81 \| 184 \| 0.6198 \|
	\| 0.3075 \| 6.96 \| 188 \| 0.6198 \|
	\| 0.4133 \| 7.11 \| 192 \| 0.6199 \|
	\| 0.3341 \| 7.26 \| 196 \| 0.6197 \|
	\| 0.385 \| 7.41 \| 200 \| 0.6197 \|
	\| 0.3254 \| 7.56 \| 204 \| 0.6198 \|
	\| 0.385 \| 7.7 \| 208 \| 0.6199 \|
	\| 0.3332 \| 7.85 \| 212 \| 0.6197 \|
	\| 0.2648 \| 8.0 \| 216 \| 0.6198 \|
	\| 0.3602 \| 8.15 \| 220 \| 0.6203 \|
	\| 0.4129 \| 8.3 \| 224 \| 0.6200 \|
	\| 0.3716 \| 8.44 \| 228 \| 0.6196 \|
	\| 0.3152 \| 8.59 \| 232 \| 0.6200 \|
	\| 0.3463 \| 8.74 \| 236 \| 0.6199 \|
	\| 0.3357 \| 8.89 \| 240 \| 0.6198 \|
	\| 0.3029 \| 9.04 \| 244 \| 0.6199 \|
	\| 0.3679 \| 9.19 \| 248 \| 0.6200 \|
	\| 0.3911 \| 9.33 \| 252 \| 0.6201 \|
	\| 0.2844 \| 9.48 \| 256 \| 0.6199 \|
	\| 0.2584 \| 9.63 \| 260 \| 0.6199 \|
	\| 0.3966 \| 9.78 \| 264 \| 0.6200 \|
	\| 0.3423 \| 9.93 \| 268 \| 0.6201 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 2.1.2+cu121
	- Datasets 2.14.6
	- Tokenizers 0.15.0