qwen2.5-0.5b-expo-DPO-ES-100 / README.md

End of training

57a1b6c verified about 1 month ago

5.73 kB

	---
	license: apache-2.0
	base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
	tags:
	- alignment-handbook
	- ndcg
	- trl
	- expo
	- generated_from_trainer
	- trl
	- expo
	- generated_from_trainer
	datasets:
	- hZzy/train_pairwise
	model-index:
	- name: qwen2.5-0.5b-expo-DPO-ES-100
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/tkmcvc2n)
	# qwen2.5-0.5b-expo-DPO-ES-100

	This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise dataset.
	It achieves the following results on the evaluation set:
	- Loss: 226.3468
	- Logps: -80.2667
	- Logits: -0.6269
	- Objective: 213.3031
	- Dpo Loss: 213.3031
	- Regularize: 213.3031
	- Ranking Simple: 0.5399
	- Ranking Idealized: 0.5212
	- Ranking Idealized Expo: 0.5212
	- Wo Beta: 6.6215

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 3
	- gradient_accumulation_steps: 12
	- total_train_batch_size: 144
	- total_eval_batch_size: 12
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Logps \| Logits \| Objective \| Dpo Loss \| Regularize \| Ranking Simple \| Ranking Idealized \| Ranking Idealized Expo \| Wo Beta \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|:-------:\|:---------:\|:--------:\|:----------:\|:--------------:\|:-----------------:\|:----------------------:\|:-------:\|
	\| 17.3723 \| 0.1417 \| 50 \| 32.1125 \| -90.9520 \| -1.4391 \| 31.4854 \| 31.4854 \| 31.4854 \| 0.5264 \| 0.5212 \| 0.5212 \| 7.6851 \|
	\| 60.4454 \| 0.2834 \| 100 \| 70.9968 \| -86.6000 \| -1.4386 \| 70.7719 \| 70.7719 \| 70.7719 \| 0.5305 \| 0.5212 \| 0.5212 \| 7.5289 \|
	\| 100.2237 \| 0.4251 \| 150 \| 129.8928 \| -85.7303 \| -1.2892 \| 126.8845 \| 126.8845 \| 126.8845 \| 0.5321 \| 0.5212 \| 0.5212 \| 7.4641 \|
	\| 120.8284 \| 0.5668 \| 200 \| 164.0152 \| -75.5542 \| -1.3195 \| 159.5013 \| 159.5013 \| 159.5013 \| 0.5357 \| 0.5212 \| 0.5212 \| 7.1836 \|
	\| 134.8217 \| 0.7085 \| 250 \| 195.7212 \| -79.3891 \| -1.2058 \| 190.8510 \| 190.8510 \| 190.8510 \| 0.5285 \| 0.5212 \| 0.5212 \| 7.2711 \|
	\| 119.0273 \| 0.8503 \| 300 \| 192.5231 \| -84.2971 \| -0.9945 \| 188.0580 \| 188.0580 \| 188.0580 \| 0.5357 \| 0.5212 \| 0.5212 \| 6.9382 \|
	\| 114.0792 \| 0.9920 \| 350 \| 205.7797 \| -82.1125 \| -1.0045 \| 192.3920 \| 192.3920 \| 192.3920 \| 0.5409 \| 0.5212 \| 0.5212 \| 6.9235 \|
	\| 72.4145 \| 1.1337 \| 400 \| 212.6613 \| -82.8156 \| -0.7120 \| 204.8122 \| 204.8122 \| 204.8122 \| 0.5409 \| 0.5212 \| 0.5212 \| 7.0485 \|
	\| 76.9668 \| 1.2754 \| 450 \| 210.2291 \| -82.4190 \| -0.7807 \| 203.0261 \| 203.0261 \| 203.0261 \| 0.5383 \| 0.5212 \| 0.5212 \| 6.9244 \|
	\| 77.9261 \| 1.4171 \| 500 \| 211.3156 \| -81.3728 \| -0.7438 \| 202.1569 \| 202.1569 \| 202.1569 \| 0.5362 \| 0.5212 \| 0.5212 \| 6.8863 \|
	\| 70.5755 \| 1.5588 \| 550 \| 212.6468 \| -82.3296 \| -0.6838 \| 200.1410 \| 200.1410 \| 200.1410 \| 0.5430 \| 0.5212 \| 0.5212 \| 6.7241 \|
	\| 69.6026 \| 1.7005 \| 600 \| 212.0254 \| -80.7129 \| -0.5569 \| 196.9669 \| 196.9669 \| 196.9669 \| 0.5419 \| 0.5212 \| 0.5212 \| 6.6975 \|
	\| 69.7829 \| 1.8422 \| 650 \| 222.2766 \| -79.4968 \| -0.7062 \| 209.6782 \| 209.6782 \| 209.6782 \| 0.5404 \| 0.5212 \| 0.5212 \| 6.6541 \|
	\| 62.7864 \| 1.9839 \| 700 \| 226.3468 \| -80.2667 \| -0.6269 \| 213.3031 \| 213.3031 \| 213.3031 \| 0.5399 \| 0.5212 \| 0.5212 \| 6.6215 \|
	\| 37.3326 \| 2.1256 \| 750 \| 219.7785 \| -80.5665 \| -0.7007 \| 208.8723 \| 208.8723 \| 208.8723 \| 0.5440 \| 0.5212 \| 0.5212 \| 6.7265 \|
	\| 33.2099 \| 2.2674 \| 800 \| 221.8786 \| -81.8901 \| -0.5673 \| 207.6881 \| 207.6881 \| 207.6881 \| 0.5450 \| 0.5212 \| 0.5212 \| 6.6717 \|
	\| 33.915 \| 2.4091 \| 850 \| 217.6955 \| -81.9134 \| -0.5178 \| 205.0515 \| 205.0515 \| 205.0515 \| 0.5424 \| 0.5212 \| 0.5212 \| 6.7249 \|
	\| 35.3572 \| 2.5508 \| 900 \| 224.5402 \| -81.5880 \| -0.4729 \| 214.5052 \| 214.5052 \| 214.5052 \| 0.5435 \| 0.5212 \| 0.5212 \| 6.8278 \|
	\| 31.032 \| 2.6925 \| 950 \| 225.2907 \| -80.3480 \| -0.5542 \| 216.5803 \| 216.5803 \| 216.5803 \| 0.5419 \| 0.5212 \| 0.5212 \| 6.8429 \|


	### Framework versions

	- Transformers 4.42.0
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1