End of training

dee307d verified 9 months ago

5.71 kB

	---
	license: apache-2.0
	base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
	tags:
	- alignment-handbook
	- ndcg
	- trl
	- expo
	- generated_from_trainer
	- trl
	- expo
	- generated_from_trainer
	datasets:
	- hZzy/train_pairwise
	model-index:
	- name: qwen2.5-0.5b-expo-DPO-ES-1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/qhqu3qtk)
	# qwen2.5-0.5b-expo-DPO-ES-1

	This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.3243
	- Logps: -83.2882
	- Logits: -0.6651
	- Objective: 2.2471
	- Dpo Loss: 2.2471
	- Regularize: 2.2471
	- Ranking Simple: 0.5378
	- Ranking Idealized: 0.5295
	- Ranking Idealized Expo: 0.5212
	- Wo Beta: 6.6815

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 3
	- gradient_accumulation_steps: 12
	- total_train_batch_size: 144
	- total_eval_batch_size: 12
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Logps \| Logits \| Objective \| Dpo Loss \| Regularize \| Ranking Simple \| Ranking Idealized \| Ranking Idealized Expo \| Wo Beta \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|:-------:\|:---------:\|:--------:\|:----------:\|:--------------:\|:-----------------:\|:----------------------:\|:-------:\|
	\| 0.7017 \| 0.1417 \| 50 \| 0.8470 \| -93.0243 \| -1.4582 \| 0.8570 \| 0.8570 \| 0.8570 \| 0.5238 \| 0.5295 \| 0.5212 \| 7.8507 \|
	\| 0.8112 \| 0.2834 \| 100 \| 1.0529 \| -86.6835 \| -1.4382 \| 1.0273 \| 1.0273 \| 1.0273 \| 0.5285 \| 0.5295 \| 0.5212 \| 7.4982 \|
	\| 1.0895 \| 0.4251 \| 150 \| 1.4497 \| -84.4337 \| -1.2965 \| 1.4010 \| 1.4010 \| 1.4010 \| 0.5321 \| 0.5295 \| 0.5212 \| 7.2692 \|
	\| 1.2363 \| 0.5668 \| 200 \| 1.7035 \| -77.7201 \| -1.2956 \| 1.6116 \| 1.6116 \| 1.6116 \| 0.5321 \| 0.5295 \| 0.5212 \| 7.2264 \|
	\| 1.3152 \| 0.7085 \| 250 \| 1.9222 \| -92.7241 \| -1.2565 \| 1.8319 \| 1.8319 \| 1.8319 \| 0.5311 \| 0.5295 \| 0.5212 \| 7.1856 \|
	\| 1.1899 \| 0.8503 \| 300 \| 2.0298 \| -90.9373 \| -0.9785 \| 1.9588 \| 1.9588 \| 1.9588 \| 0.5367 \| 0.5295 \| 0.5212 \| 6.9336 \|
	\| 1.1443 \| 0.9920 \| 350 \| 2.1654 \| -82.1414 \| -1.0214 \| 2.0541 \| 2.0541 \| 2.0541 \| 0.5435 \| 0.5295 \| 0.5212 \| 7.0024 \|
	\| 0.725 \| 1.1337 \| 400 \| 2.2884 \| -84.2526 \| -0.7535 \| 2.2360 \| 2.2360 \| 2.2360 \| 0.5336 \| 0.5295 \| 0.5212 \| 7.1525 \|
	\| 0.7629 \| 1.2754 \| 450 \| 2.1606 \| -80.4165 \| -0.8866 \| 2.0671 \| 2.0671 \| 2.0671 \| 0.5321 \| 0.5295 \| 0.5212 \| 6.7949 \|
	\| 0.8044 \| 1.4171 \| 500 \| 2.2094 \| -82.3927 \| -0.7503 \| 2.0981 \| 2.0981 \| 2.0981 \| 0.5347 \| 0.5295 \| 0.5212 \| 6.8050 \|
	\| 0.7105 \| 1.5588 \| 550 \| 2.1697 \| -84.9780 \| -0.6734 \| 2.0733 \| 2.0733 \| 2.0733 \| 0.5321 \| 0.5295 \| 0.5212 \| 6.8722 \|
	\| 0.6925 \| 1.7005 \| 600 \| 2.1957 \| -81.5342 \| -0.7411 \| 2.0558 \| 2.0558 \| 2.0558 \| 0.5357 \| 0.5295 \| 0.5212 \| 6.7186 \|
	\| 0.6883 \| 1.8422 \| 650 \| 2.2080 \| -82.7303 \| -0.6908 \| 2.1330 \| 2.1330 \| 2.1330 \| 0.5383 \| 0.5295 \| 0.5212 \| 6.8081 \|
	\| 0.6486 \| 1.9839 \| 700 \| 2.3243 \| -83.2882 \| -0.6651 \| 2.2471 \| 2.2471 \| 2.2471 \| 0.5378 \| 0.5295 \| 0.5212 \| 6.6815 \|
	\| 0.3793 \| 2.1256 \| 750 \| 2.2675 \| -84.2296 \| -0.7879 \| 2.1825 \| 2.1825 \| 2.1825 \| 0.5409 \| 0.5295 \| 0.5212 \| 6.8794 \|
	\| 0.3314 \| 2.2674 \| 800 \| 2.2106 \| -84.3675 \| -0.6651 \| 2.1041 \| 2.1041 \| 2.1041 \| 0.5414 \| 0.5295 \| 0.5212 \| 6.7463 \|
	\| 0.3301 \| 2.4091 \| 850 \| 2.2964 \| -84.8913 \| -0.6177 \| 2.2221 \| 2.2221 \| 2.2221 \| 0.5388 \| 0.5295 \| 0.5212 \| 6.8020 \|
	\| 0.3509 \| 2.5508 \| 900 \| 2.2796 \| -84.3833 \| -0.6097 \| 2.2099 \| 2.2099 \| 2.2099 \| 0.5393 \| 0.5295 \| 0.5212 \| 6.7934 \|
	\| 0.321 \| 2.6925 \| 950 \| 2.3403 \| -83.2967 \| -0.7158 \| 2.2649 \| 2.2649 \| 2.2649 \| 0.5331 \| 0.5295 \| 0.5212 \| 6.8864 \|


	### Framework versions

	- Transformers 4.42.0
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1