metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
- trl
- expo
- generated_from_trainer
model-index:
- name: qwen2.5-0.5b-expo-L2EXPO-ES-0.1
results: []
qwen2.5-0.5b-expo-L2EXPO-ES-0.1
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6428
- Logps: -79.5818
- Logits: -0.6068
- Objective: 0.6266
- Dpo Loss: 0.7211
- Regularize: 0.6266
- Ranking Simple: 0.5316
- Ranking Idealized: 0.6030
- Ranking Idealized Expo: 0.5223
- Wo Beta: 14.3406
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 144
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss | Logps | Logits | Objective | Dpo Loss | Regularize | Ranking Simple | Ranking Idealized | Ranking Idealized Expo | Wo Beta |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.4017 | 0.1417 | 50 | 0.4165 | -93.1726 | -1.5024 | 0.4149 | 0.6868 | 0.4149 | 0.5259 | 0.6030 | 0.5223 | 16.4267 |
0.3777 | 0.2834 | 100 | 0.4360 | -92.8653 | -1.4775 | 0.4269 | 0.6818 | 0.4269 | 0.5316 | 0.6030 | 0.5223 | 16.2439 |
0.4057 | 0.4251 | 150 | 0.4911 | -84.1774 | -1.2946 | 0.4805 | 0.6897 | 0.4805 | 0.5383 | 0.6030 | 0.5223 | 15.6306 |
0.4475 | 0.5668 | 200 | 0.5660 | -89.7342 | -0.9897 | 0.5515 | 0.7103 | 0.5515 | 0.5316 | 0.6030 | 0.5223 | 15.1280 |
0.455 | 0.7085 | 250 | 0.5978 | -78.1917 | -1.0033 | 0.5822 | 0.7171 | 0.5822 | 0.5311 | 0.6030 | 0.5223 | 14.6763 |
0.4337 | 0.8503 | 300 | 0.5993 | -78.8918 | -0.6761 | 0.5779 | 0.7105 | 0.5779 | 0.5300 | 0.6030 | 0.5223 | 14.9196 |
0.4039 | 0.9920 | 350 | 0.5978 | -75.1520 | -0.7968 | 0.5765 | 0.7078 | 0.5765 | 0.5290 | 0.6030 | 0.5223 | 14.6531 |
0.3729 | 1.1337 | 400 | 0.6180 | -75.1433 | -0.5569 | 0.6000 | 0.7153 | 0.6000 | 0.5228 | 0.6030 | 0.5223 | 14.6471 |
0.3454 | 1.2754 | 450 | 0.6316 | -76.2289 | -0.6214 | 0.6131 | 0.7165 | 0.6131 | 0.5336 | 0.6030 | 0.5223 | 14.5034 |
0.3226 | 1.4171 | 500 | 0.6255 | -77.6040 | -0.5608 | 0.6084 | 0.7204 | 0.6084 | 0.5285 | 0.6030 | 0.5223 | 14.4998 |
0.3133 | 1.5588 | 550 | 0.6282 | -78.6291 | -0.6736 | 0.6138 | 0.7139 | 0.6138 | 0.5336 | 0.6030 | 0.5223 | 14.4069 |
0.2944 | 1.7005 | 600 | 0.6321 | -78.9179 | -0.5620 | 0.6139 | 0.7175 | 0.6139 | 0.5357 | 0.6030 | 0.5223 | 14.6142 |
0.2915 | 1.8422 | 650 | 0.6321 | -77.4437 | -0.7021 | 0.6157 | 0.7138 | 0.6157 | 0.5367 | 0.6030 | 0.5223 | 14.3858 |
0.2675 | 1.9839 | 700 | 0.6386 | -79.3600 | -0.5612 | 0.6233 | 0.7185 | 0.6233 | 0.5290 | 0.6030 | 0.5223 | 14.3171 |
0.2415 | 2.1256 | 750 | 0.6405 | -80.0990 | -0.6174 | 0.6263 | 0.7177 | 0.6263 | 0.5347 | 0.6030 | 0.5223 | 14.4302 |
0.2263 | 2.2674 | 800 | 0.6458 | -79.3784 | -0.5665 | 0.6297 | 0.7206 | 0.6297 | 0.5347 | 0.6030 | 0.5223 | 14.3163 |
0.2148 | 2.4091 | 850 | 0.6436 | -79.0806 | -0.5793 | 0.6276 | 0.7192 | 0.6276 | 0.5362 | 0.6030 | 0.5223 | 14.4263 |
0.1993 | 2.5508 | 900 | 0.6454 | -80.3815 | -0.5621 | 0.6302 | 0.7217 | 0.6302 | 0.5342 | 0.6030 | 0.5223 | 14.4491 |
0.1887 | 2.6925 | 950 | 0.6443 | -79.1446 | -0.6216 | 0.6274 | 0.7204 | 0.6274 | 0.5336 | 0.6030 | 0.5223 | 14.3186 |
0.1764 | 2.8342 | 1000 | 0.6399 | -79.7721 | -0.6087 | 0.6246 | 0.7200 | 0.6246 | 0.5336 | 0.6030 | 0.5223 | 14.4502 |
0.163 | 2.9759 | 1050 | 0.6428 | -79.5818 | -0.6068 | 0.6266 | 0.7211 | 0.6266 | 0.5316 | 0.6030 | 0.5223 | 14.3406 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1