Model save

0891d87 12 months ago

7.49 kB

	---
	license: apache-2.0
	base_model: glimmerz/zephyr-7b-sft-full
	tags:
	- generated_from_trainer
	model-index:
	- name: zephyr-7b-dpo-full
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-dpo-full

	This model is a fine-tuned version of [glimmerz/zephyr-7b-sft-full](https://huggingface.co/glimmerz/zephyr-7b-sft-full) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7385
	- Rewards/chosen: -4.7566
	- Rewards/rejected: -8.6166
	- Rewards/accuracies: 0.7560
	- Rewards/margins: 3.8601
	- Logps/rejected: -315.8341
	- Logps/chosen: -321.4129
	- Logits/rejected: -2.2590
	- Logits/chosen: -2.3620

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 8
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 64
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.575 \| 0.1 \| 100 \| 0.5309 \| -0.0101 \| -0.6034 \| 0.7460 \| 0.5933 \| -235.7018 \| -273.9487 \| -2.6525 \| -2.7458 \|
	\| 0.4759 \| 0.21 \| 200 \| 0.4943 \| -0.0642 \| -1.0829 \| 0.75 \| 1.0187 \| -240.4966 \| -274.4892 \| -2.7066 \| -2.8006 \|
	\| 0.5022 \| 0.31 \| 300 \| 0.4824 \| -0.1526 \| -1.2517 \| 0.7620 \| 1.0991 \| -242.1845 \| -275.3735 \| -2.7362 \| -2.8225 \|
	\| 0.5282 \| 0.41 \| 400 \| 0.4878 \| -0.6794 \| -1.9420 \| 0.7840 \| 1.2626 \| -249.0876 \| -280.6413 \| -2.7023 \| -2.7924 \|
	\| 0.5179 \| 0.52 \| 500 \| 0.4805 \| -0.2645 \| -1.4485 \| 0.7760 \| 1.1841 \| -244.1532 \| -276.4918 \| -2.6773 \| -2.7631 \|
	\| 0.4705 \| 0.62 \| 600 \| 0.4715 \| -0.3016 \| -1.5766 \| 0.7560 \| 1.2750 \| -245.4337 \| -276.8629 \| -2.7009 \| -2.7838 \|
	\| 0.5038 \| 0.72 \| 700 \| 0.4790 \| -0.3119 \| -1.5731 \| 0.7680 \| 1.2612 \| -245.3986 \| -276.9666 \| -2.5409 \| -2.6269 \|
	\| 0.4418 \| 0.83 \| 800 \| 0.4665 \| -0.4564 \| -2.0177 \| 0.7800 \| 1.5612 \| -249.8442 \| -278.4113 \| -2.4834 \| -2.5636 \|
	\| 0.5155 \| 0.93 \| 900 \| 0.4770 \| -0.3715 \| -1.7079 \| 0.7740 \| 1.3364 \| -246.7468 \| -277.5622 \| -2.5118 \| -2.5927 \|
	\| 0.3463 \| 1.03 \| 1000 \| 0.4755 \| -0.5305 \| -1.8263 \| 0.7680 \| 1.2958 \| -247.9306 \| -279.1520 \| -2.6282 \| -2.7083 \|
	\| 0.1266 \| 1.14 \| 1100 \| 0.4924 \| -1.0131 \| -2.8651 \| 0.7740 \| 1.8519 \| -258.3182 \| -283.9783 \| -2.5584 \| -2.6430 \|
	\| 0.0751 \| 1.24 \| 1200 \| 0.5208 \| -1.4508 \| -3.6646 \| 0.7760 \| 2.2138 \| -266.3139 \| -288.3549 \| -2.5574 \| -2.6450 \|
	\| 0.0306 \| 1.34 \| 1300 \| 0.5779 \| -2.1463 \| -4.7450 \| 0.7580 \| 2.5987 \| -277.1172 \| -295.3102 \| -2.4957 \| -2.5865 \|
	\| 0.031 \| 1.45 \| 1400 \| 0.5993 \| -2.6730 \| -5.3111 \| 0.7580 \| 2.6381 \| -282.7792 \| -300.5774 \| -2.5157 \| -2.6051 \|
	\| 0.0535 \| 1.55 \| 1500 \| 0.5731 \| -2.1627 \| -4.7943 \| 0.75 \| 2.6316 \| -277.6110 \| -295.4747 \| -2.5616 \| -2.6529 \|
	\| 0.063 \| 1.65 \| 1600 \| 0.5433 \| -1.9823 \| -4.5765 \| 0.7580 \| 2.5942 \| -275.4325 \| -293.6702 \| -2.5038 \| -2.5985 \|
	\| 0.0423 \| 1.76 \| 1700 \| 0.5821 \| -2.6553 \| -5.4183 \| 0.7540 \| 2.7630 \| -283.8502 \| -300.3999 \| -2.4636 \| -2.5654 \|
	\| 0.0559 \| 1.86 \| 1800 \| 0.5657 \| -2.5801 \| -5.2643 \| 0.7520 \| 2.6842 \| -282.3106 \| -299.6483 \| -2.4843 \| -2.5741 \|
	\| 0.0468 \| 1.96 \| 1900 \| 0.5759 \| -2.4597 \| -5.2907 \| 0.7480 \| 2.8309 \| -282.5742 \| -298.4443 \| -2.4491 \| -2.5392 \|
	\| 0.0576 \| 2.07 \| 2000 \| 0.5614 \| -2.5997 \| -5.3232 \| 0.7620 \| 2.7235 \| -282.8997 \| -299.8446 \| -2.4132 \| -2.5016 \|
	\| 0.0135 \| 2.17 \| 2100 \| 0.6182 \| -3.1988 \| -6.3849 \| 0.7640 \| 3.1861 \| -293.5166 \| -305.8354 \| -2.4052 \| -2.5040 \|
	\| 0.0149 \| 2.27 \| 2200 \| 0.7075 \| -4.5960 \| -8.1955 \| 0.7420 \| 3.5995 \| -311.6229 \| -319.8072 \| -2.3535 \| -2.4494 \|
	\| 0.0095 \| 2.37 \| 2300 \| 0.7117 \| -4.2102 \| -7.7788 \| 0.7540 \| 3.5686 \| -307.4559 \| -315.9493 \| -2.2943 \| -2.3972 \|
	\| 0.0104 \| 2.48 \| 2400 \| 0.7131 \| -4.3371 \| -7.9252 \| 0.7540 \| 3.5881 \| -308.9199 \| -317.2180 \| -2.3097 \| -2.4097 \|
	\| 0.008 \| 2.58 \| 2500 \| 0.7328 \| -4.4361 \| -8.1696 \| 0.7520 \| 3.7335 \| -311.3636 \| -318.2084 \| -2.2756 \| -2.3764 \|
	\| 0.0051 \| 2.68 \| 2600 \| 0.7193 \| -4.2884 \| -7.9892 \| 0.7600 \| 3.7009 \| -309.5601 \| -316.7311 \| -2.3138 \| -2.4185 \|
	\| 0.0089 \| 2.79 \| 2700 \| 0.7388 \| -4.8991 \| -8.6552 \| 0.7660 \| 3.7561 \| -316.2196 \| -322.8380 \| -2.2942 \| -2.3960 \|
	\| 0.0082 \| 2.89 \| 2800 \| 0.7342 \| -4.7984 \| -8.6596 \| 0.7640 \| 3.8612 \| -316.2638 \| -321.8309 \| -2.2620 \| -2.3649 \|
	\| 0.0094 \| 2.99 \| 2900 \| 0.7374 \| -4.7573 \| -8.6168 \| 0.7580 \| 3.8595 \| -315.8361 \| -321.4205 \| -2.2595 \| -2.3625 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.0
	- Datasets 2.15.0
	- Tokenizers 0.15.0