shenxq
/

zephyr-7b-dpo-lora-pairrm

alignment-handbook

Generated from Trainer

Model card Files Files and versions Community

zephyr-7b-dpo-lora-pairrm / README.md

shenxq's picture

End of training

02e7794 verified 11 months ago

|

history blame contribute delete

4.4 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- alignment-handbook
	- generated_from_trainer
	- trl
	- dpo
	- generated_from_trainer
	datasets:
	- snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
	base_model: mistralai/Mistral-7B-Instruct-v0.2
	model-index:
	- name: zephyr-7b-dpo-lora-pairrm
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-dpo-lora-pairrm

	This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6764
	- Rewards/chosen: -0.9885
	- Rewards/rejected: -1.0650
	- Rewards/accuracies: 0.5657
	- Rewards/margins: 0.0765
	- Logps/rejected: -320.4450
	- Logps/chosen: -307.4615
	- Logits/rejected: -2.7535
	- Logits/chosen: -2.7599

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 4
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6916 \| 0.08 \| 100 \| 0.6925 \| -0.0162 \| -0.0177 \| 0.5280 \| 0.0015 \| -215.7187 \| -210.2296 \| -2.5058 \| -2.5086 \|
	\| 0.6855 \| 0.16 \| 200 \| 0.6880 \| -0.0651 \| -0.0772 \| 0.5613 \| 0.0121 \| -221.6710 \| -215.1240 \| -2.5152 \| -2.5178 \|
	\| 0.6825 \| 0.24 \| 300 \| 0.6854 \| -0.1874 \| -0.2081 \| 0.5473 \| 0.0207 \| -234.7546 \| -227.3457 \| -2.5175 \| -2.5192 \|
	\| 0.6676 \| 0.32 \| 400 \| 0.6827 \| -0.2909 \| -0.3222 \| 0.5477 \| 0.0313 \| -246.1682 \| -237.7042 \| -2.5347 \| -2.5368 \|
	\| 0.6458 \| 0.4 \| 500 \| 0.6805 \| -0.3693 \| -0.4104 \| 0.5567 \| 0.0410 \| -254.9852 \| -245.5435 \| -2.6328 \| -2.6364 \|
	\| 0.6592 \| 0.48 \| 600 \| 0.6789 \| -0.6010 \| -0.6528 \| 0.5560 \| 0.0518 \| -279.2278 \| -268.7087 \| -2.6805 \| -2.6845 \|
	\| 0.6107 \| 0.56 \| 700 \| 0.6785 \| -0.8159 \| -0.8786 \| 0.5550 \| 0.0627 \| -301.8047 \| -290.1964 \| -2.6914 \| -2.6969 \|
	\| 0.6475 \| 0.64 \| 800 \| 0.6770 \| -0.8845 \| -0.9544 \| 0.5610 \| 0.0699 \| -309.3867 \| -297.0627 \| -2.7237 \| -2.7295 \|
	\| 0.6639 \| 0.72 \| 900 \| 0.6766 \| -0.9705 \| -1.0450 \| 0.5667 \| 0.0746 \| -318.4507 \| -305.6558 \| -2.7464 \| -2.7525 \|
	\| 0.6305 \| 0.8 \| 1000 \| 0.6764 \| -0.9844 \| -1.0603 \| 0.5680 \| 0.0759 \| -319.9799 \| -307.0536 \| -2.7543 \| -2.7606 \|
	\| 0.6754 \| 0.88 \| 1100 \| 0.6763 \| -0.9882 \| -1.0648 \| 0.5687 \| 0.0766 \| -320.4283 \| -307.4264 \| -2.7538 \| -2.7602 \|
	\| 0.6577 \| 0.96 \| 1200 \| 0.6764 \| -0.9885 \| -1.0649 \| 0.5663 \| 0.0764 \| -320.4412 \| -307.4615 \| -2.7538 \| -2.7602 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 2.1.2
	- Datasets 2.14.6
	- Tokenizers 0.15.0