hZzy
/

qwen2.5-0.5b-expo-DPO-noES5-1

alignment-handbook

Generated from Trainer

Model card Files Files and versions Community

qwen2.5-0.5b-expo-DPO-noES5-1 / README.md

hZzy's picture

End of training

e2cf1b0 verified 15 days ago

|

history blame contribute delete

3.64 kB

	---
	license: apache-2.0
	base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
	tags:
	- alignment-handbook
	- ndcg
	- trl
	- expo
	- generated_from_trainer
	- trl
	- expo
	- generated_from_trainer
	datasets:
	- hZzy/train_pairwise_weighted
	model-index:
	- name: qwen2.5-0.5b-expo-DPO-noES5-1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/ka5w2jn7)
	# qwen2.5-0.5b-expo-DPO-noES5-1

	This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise_weighted dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.9399
	- Logps: -81.1684
	- Logits: -0.8509
	- Objective: 1.8787
	- Dpo Loss: 1.8787
	- Ranking Simple: 0.5347

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 3
	- gradient_accumulation_steps: 12
	- total_train_batch_size: 144
	- total_eval_batch_size: 12
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Logps \| Logits \| Objective \| Dpo Loss \| Ranking Simple \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|:-------:\|:---------:\|:--------:\|:--------------:\|
	\| 1.1219 \| 0.1417 \| 50 \| 1.1407 \| -91.1504 \| -1.3894 \| 1.1280 \| 1.1280 \| 0.5274 \|
	\| 1.3821 \| 0.2834 \| 100 \| 1.5304 \| -81.2234 \| -1.3774 \| 1.4947 \| 1.4947 \| 0.5290 \|
	\| 1.4062 \| 0.4251 \| 150 \| 1.8818 \| -79.7787 \| -1.1641 \| 1.8192 \| 1.8192 \| 0.5430 \|
	\| 1.2275 \| 0.5668 \| 200 \| 2.0358 \| -77.9854 \| -1.1289 \| 1.9717 \| 1.9717 \| 0.5347 \|
	\| 1.1914 \| 0.7085 \| 250 \| 2.0084 \| -78.3385 \| -1.0883 \| 1.9461 \| 1.9461 \| 0.5347 \|
	\| 1.0378 \| 0.8503 \| 300 \| 2.0918 \| -83.4707 \| -0.9324 \| 2.0357 \| 2.0357 \| 0.5352 \|
	\| 0.8334 \| 0.9920 \| 350 \| 2.1143 \| -81.1740 \| -0.8755 \| 1.9975 \| 1.9975 \| 0.5388 \|
	\| 0.4251 \| 1.1337 \| 400 \| 2.0641 \| -81.1689 \| -0.8003 \| 2.0241 \| 2.0241 \| 0.5435 \|
	\| 0.3886 \| 1.2754 \| 450 \| 2.0085 \| -79.8813 \| -0.8999 \| 1.9598 \| 1.9598 \| 0.5388 \|
	\| 0.4352 \| 1.4171 \| 500 \| 2.0449 \| -80.7357 \| -0.8634 \| 1.9819 \| 1.9819 \| 0.5367 \|
	\| 0.3103 \| 1.5588 \| 550 \| 1.9784 \| -80.8827 \| -0.8672 \| 1.9073 \| 1.9073 \| 0.5373 \|
	\| 0.2489 \| 1.7005 \| 600 \| 1.9488 \| -81.0833 \| -0.8421 \| 1.8851 \| 1.8851 \| 0.5367 \|
	\| 0.3631 \| 1.8422 \| 650 \| 1.9417 \| -81.1721 \| -0.8529 \| 1.8805 \| 1.8805 \| 0.5347 \|
	\| 0.3009 \| 1.9839 \| 700 \| 1.9399 \| -81.1684 \| -0.8509 \| 1.8787 \| 1.8787 \| 0.5347 \|


	### Framework versions

	- Transformers 4.42.0
	- Pytorch 2.3.0+cu121
	- Datasets 3.2.0
	- Tokenizers 0.19.1