End of training

fc5a32d verified about 1 month ago

6.11 kB

	---
	license: apache-2.0
	base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
	tags:
	- alignment-handbook
	- ndcg
	- trl
	- expo
	- generated_from_trainer
	- trl
	- expo
	- generated_from_trainer
	datasets:
	- hZzy/train_pairwise_weighted
	model-index:
	- name: qwen2.5-0.5b-expo-L1EXPO-noES-0.1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/uh50oig6)
	# qwen2.5-0.5b-expo-L1EXPO-noES-0.1

	This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise_weighted dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1381
	- Logps: -85.9802
	- Logits: -1.2306
	- Objective: 0.1370
	- Dpo Loss: 0.6974
	- Regularize: 0.1370
	- Ranking Simple: 0.5243
	- Ranking Idealized: 0.6025
	- Ranking Idealized Expo: 0.5233
	- Wo Beta: 15.6347

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 3
	- gradient_accumulation_steps: 12
	- total_train_batch_size: 144
	- total_eval_batch_size: 12
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Logps \| Logits \| Objective \| Dpo Loss \| Regularize \| Ranking Simple \| Ranking Idealized \| Ranking Idealized Expo \| Wo Beta \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|:-------:\|:---------:\|:--------:\|:----------:\|:--------------:\|:-----------------:\|:----------------------:\|:-------:\|
	\| 0.0351 \| 0.1417 \| 50 \| 0.0221 \| -91.2329 \| -1.3917 \| 0.0224 \| 0.6927 \| 0.0224 \| 0.5212 \| 0.6025 \| 0.5233 \| 16.2217 \|
	\| 0.0877 \| 0.2834 \| 100 \| 0.0433 \| -88.6602 \| -1.3863 \| 0.0447 \| 0.6922 \| 0.0447 \| 0.5238 \| 0.6025 \| 0.5233 \| 16.1682 \|
	\| 0.1323 \| 0.4251 \| 150 \| 0.0768 \| -90.4377 \| -1.3054 \| 0.0764 \| 0.6956 \| 0.0764 \| 0.5223 \| 0.6025 \| 0.5233 \| 16.0034 \|
	\| 0.1427 \| 0.5668 \| 200 \| 0.1032 \| -88.3433 \| -1.3124 \| 0.1017 \| 0.6959 \| 0.1017 \| 0.5223 \| 0.6025 \| 0.5233 \| 15.9928 \|
	\| 0.1451 \| 0.7085 \| 250 \| 0.1178 \| -88.0698 \| -1.2854 \| 0.1185 \| 0.6950 \| 0.1185 \| 0.5274 \| 0.6025 \| 0.5233 \| 15.7878 \|
	\| 0.1305 \| 0.8503 \| 300 \| 0.1247 \| -86.3312 \| -1.2863 \| 0.1252 \| 0.6961 \| 0.1252 \| 0.5280 \| 0.6025 \| 0.5233 \| 15.7668 \|
	\| 0.1407 \| 0.9920 \| 350 \| 0.1314 \| -86.4501 \| -1.2757 \| 0.1310 \| 0.6976 \| 0.1310 \| 0.5223 \| 0.6025 \| 0.5233 \| 15.6570 \|
	\| 0.1245 \| 1.1337 \| 400 \| 0.1399 \| -86.2849 \| -1.2418 \| 0.1390 \| 0.6980 \| 0.1390 \| 0.5259 \| 0.6025 \| 0.5233 \| 15.6147 \|
	\| 0.1163 \| 1.2754 \| 450 \| 0.1421 \| -85.4828 \| -1.2307 \| 0.1421 \| 0.6985 \| 0.1421 \| 0.5274 \| 0.6025 \| 0.5233 \| 15.6128 \|
	\| 0.1071 \| 1.4171 \| 500 \| 0.1382 \| -87.2673 \| -1.2270 \| 0.1376 \| 0.6980 \| 0.1376 \| 0.5285 \| 0.6025 \| 0.5233 \| 15.6445 \|
	\| 0.1045 \| 1.5588 \| 550 \| 0.1428 \| -87.0776 \| -1.2327 \| 0.1426 \| 0.6977 \| 0.1426 \| 0.5254 \| 0.6025 \| 0.5233 \| 15.5807 \|
	\| 0.0866 \| 1.7005 \| 600 \| 0.1424 \| -85.1926 \| -1.2196 \| 0.1408 \| 0.6965 \| 0.1408 \| 0.5269 \| 0.6025 \| 0.5233 \| 15.6603 \|
	\| 0.0847 \| 1.8422 \| 650 \| 0.1380 \| -86.1129 \| -1.2229 \| 0.1356 \| 0.6974 \| 0.1356 \| 0.5243 \| 0.6025 \| 0.5233 \| 15.6660 \|
	\| 0.071 \| 1.9839 \| 700 \| 0.1420 \| -85.2496 \| -1.2208 \| 0.1405 \| 0.6980 \| 0.1405 \| 0.5254 \| 0.6025 \| 0.5233 \| 15.6109 \|
	\| 0.0546 \| 2.1256 \| 750 \| 0.1423 \| -85.4691 \| -1.2233 \| 0.1407 \| 0.6980 \| 0.1407 \| 0.5259 \| 0.6025 \| 0.5233 \| 15.6480 \|
	\| 0.0531 \| 2.2674 \| 800 \| 0.1386 \| -86.1368 \| -1.2206 \| 0.1371 \| 0.6981 \| 0.1371 \| 0.5243 \| 0.6025 \| 0.5233 \| 15.6234 \|
	\| 0.0444 \| 2.4091 \| 850 \| 0.1395 \| -86.0362 \| -1.2271 \| 0.1382 \| 0.6980 \| 0.1382 \| 0.5238 \| 0.6025 \| 0.5233 \| 15.6472 \|
	\| 0.0438 \| 2.5508 \| 900 \| 0.1387 \| -85.8840 \| -1.2296 \| 0.1374 \| 0.6975 \| 0.1374 \| 0.5238 \| 0.6025 \| 0.5233 \| 15.6345 \|
	\| 0.0384 \| 2.6925 \| 950 \| 0.1380 \| -85.9590 \| -1.2285 \| 0.1368 \| 0.6975 \| 0.1368 \| 0.5238 \| 0.6025 \| 0.5233 \| 15.6425 \|
	\| 0.0375 \| 2.8342 \| 1000 \| 0.1380 \| -85.9976 \| -1.2305 \| 0.1369 \| 0.6974 \| 0.1369 \| 0.5243 \| 0.6025 \| 0.5233 \| 15.6355 \|
	\| 0.0397 \| 2.9759 \| 1050 \| 0.1381 \| -85.9802 \| -1.2306 \| 0.1370 \| 0.6974 \| 0.1370 \| 0.5243 \| 0.6025 \| 0.5233 \| 15.6347 \|


	### Framework versions

	- Transformers 4.42.0
	- Pytorch 2.3.0+cu121
	- Datasets 3.2.0
	- Tokenizers 0.19.1