Model save

8f17533 verified 8 months ago

7.24 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: alignment-handbook/zephyr-7b-sft-full
	model-index:
	- name: zephyr-7b-dpo-lora-pubmedqa-mix2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-dpo-lora-pubmedqa-mix2

	This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0013
	- Rewards/chosen: -1.8126
	- Rewards/rejected: -10.9731
	- Rewards/accuracies: 1.0
	- Rewards/margins: 9.1605
	- Logps/rejected: -1144.0397
	- Logps/chosen: -242.4412
	- Logits/rejected: -1.7638
	- Logits/chosen: -2.8841

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 4
	- total_eval_batch_size: 2
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.2697 \| 0.04 \| 3000 \| 0.3396 \| 0.2213 \| -0.6386 \| 1.0 \| 0.8599 \| -110.5876 \| -39.0518 \| -3.0278 \| -3.0862 \|
	\| 0.1599 \| 0.07 \| 6000 \| 0.0750 \| -0.5884 \| -3.6673 \| 1.0 \| 3.0789 \| -413.4546 \| -120.0204 \| -2.9055 \| -3.0346 \|
	\| 0.0563 \| 0.11 \| 9000 \| 0.0204 \| -0.6260 \| -5.6712 \| 1.0 \| 5.0452 \| -613.8441 \| -123.7819 \| -3.0269 \| -3.1136 \|
	\| 0.0463 \| 0.14 \| 12000 \| 0.0287 \| -0.7209 \| -7.9224 \| 1.0 \| 7.2014 \| -838.9609 \| -133.2740 \| -3.0642 \| -3.1628 \|
	\| 0.1206 \| 0.18 \| 15000 \| 0.0030 \| -0.9209 \| -8.8089 \| 1.0 \| 7.8880 \| -927.6118 \| -153.2670 \| -3.0802 \| -3.1766 \|
	\| 0.0508 \| 0.22 \| 18000 \| 0.4964 \| -0.4026 \| -8.0330 \| 1.0 \| 7.6304 \| -850.0245 \| -101.4397 \| -3.1314 \| -3.2075 \|
	\| 0.0323 \| 0.25 \| 21000 \| 0.0872 \| -1.4713 \| -10.3437 \| 1.0 \| 8.8723 \| -1081.0913 \| -208.3129 \| -2.6496 \| -3.1189 \|
	\| 0.4534 \| 0.29 \| 24000 \| 0.0077 \| -2.3507 \| -12.1827 \| 1.0 \| 9.8320 \| -1264.9957 \| -296.2491 \| -1.6282 \| -2.8665 \|
	\| 0.0013 \| 0.32 \| 27000 \| 0.0019 \| -2.1480 \| -10.6645 \| 1.0 \| 8.5166 \| -1113.1797 \| -275.9768 \| -1.7614 \| -2.8604 \|
	\| 0.1404 \| 0.36 \| 30000 \| 0.0002 \| -2.4964 \| -12.4101 \| 1.0 \| 9.9138 \| -1287.7384 \| -310.8155 \| -1.5907 \| -2.8352 \|
	\| 0.0198 \| 0.4 \| 33000 \| 0.0009 \| -3.0802 \| -13.3347 \| 1.0 \| 10.2545 \| -1380.1964 \| -369.1991 \| -1.6628 \| -2.8372 \|
	\| 0.0041 \| 0.43 \| 36000 \| 0.0004 \| -2.7800 \| -12.5815 \| 1.0 \| 9.8014 \| -1304.8732 \| -339.1852 \| -1.6282 \| -2.8242 \|
	\| 0.0007 \| 0.47 \| 39000 \| 0.0007 \| -2.9921 \| -13.2089 \| 1.0 \| 10.2168 \| -1367.6129 \| -360.3922 \| -1.6672 \| -2.8403 \|
	\| 0.0008 \| 0.5 \| 42000 \| 0.0013 \| -2.3107 \| -11.8754 \| 1.0 \| 9.5647 \| -1234.2609 \| -292.2454 \| -1.6475 \| -2.8400 \|
	\| 0.0024 \| 0.54 \| 45000 \| 0.0010 \| -3.3769 \| -13.2333 \| 1.0 \| 9.8564 \| -1370.0538 \| -398.8731 \| -1.6937 \| -2.8403 \|
	\| 0.0019 \| 0.57 \| 48000 \| 0.0013 \| -2.8151 \| -12.4427 \| 1.0 \| 9.6277 \| -1290.9999 \| -342.6892 \| -1.7047 \| -2.8503 \|
	\| 0.2266 \| 0.61 \| 51000 \| 0.0014 \| -1.9532 \| -11.0212 \| 1.0 \| 9.0680 \| -1148.8468 \| -256.4992 \| -1.6745 \| -2.8650 \|
	\| 0.0016 \| 0.65 \| 54000 \| 0.0014 \| -1.8077 \| -10.7512 \| 1.0 \| 8.9435 \| -1121.8423 \| -241.9466 \| -1.8328 \| -2.8946 \|
	\| 0.0019 \| 0.68 \| 57000 \| 0.0013 \| -1.8159 \| -10.8808 \| 1.0 \| 9.0649 \| -1134.8024 \| -242.7715 \| -1.7644 \| -2.8860 \|
	\| 0.0013 \| 0.72 \| 60000 \| 0.0013 \| -1.7356 \| -10.8007 \| 1.0 \| 9.0651 \| -1126.8002 \| -234.7419 \| -1.7574 \| -2.8871 \|
	\| 0.0014 \| 0.75 \| 63000 \| 0.0013 \| -1.8249 \| -10.9773 \| 1.0 \| 9.1524 \| -1144.4586 \| -243.6743 \| -1.7699 \| -2.8867 \|
	\| 0.0014 \| 0.79 \| 66000 \| 0.0013 \| -1.8308 \| -10.9698 \| 1.0 \| 9.1389 \| -1143.7017 \| -244.2651 \| -1.7597 \| -2.8841 \|
	\| 0.0011 \| 0.83 \| 69000 \| 0.0013 \| -1.8034 \| -10.9390 \| 1.0 \| 9.1356 \| -1140.6276 \| -241.5220 \| -1.7619 \| -2.8858 \|
	\| 0.0016 \| 0.86 \| 72000 \| 0.0013 \| -1.7971 \| -10.9097 \| 1.0 \| 9.1126 \| -1137.6914 \| -240.8868 \| -1.7608 \| -2.8852 \|
	\| 0.0239 \| 0.9 \| 75000 \| 0.0013 \| -1.7976 \| -10.9400 \| 1.0 \| 9.1424 \| -1140.7238 \| -240.9355 \| -1.7773 \| -2.8872 \|
	\| 0.0024 \| 0.93 \| 78000 \| 0.0013 \| -1.7862 \| -10.9196 \| 1.0 \| 9.1334 \| -1138.6901 \| -239.8036 \| -1.7733 \| -2.8861 \|
	\| 0.0018 \| 0.97 \| 81000 \| 0.0013 \| -1.8228 \| -10.9802 \| 1.0 \| 9.1574 \| -1144.7491 \| -243.4639 \| -1.7594 \| -2.8860 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 2.1.2+cu121
	- Datasets 2.14.6
	- Tokenizers 0.15.2