File size: 4,291 Bytes
6444666 eadb63f 6444666 eadb63f 6444666 eadb63f 6444666 eadb63f 6444666 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
---
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-25-1
tags:
- alignment-handbook
- ndcg
- trl
- expo
- generated_from_trainer
- trl
- expo
- generated_from_trainer
datasets:
- hZzy/train_pairwise_strong_new
model-index:
- name: qwen2.5-0.5b-expo-L2EXPO-25-3
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/vdeeps75)
# qwen2.5-0.5b-expo-L2EXPO-25-3
This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-25-1](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-25-1) on the hZzy/train_pairwise_strong_new dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4184
- Objective: 0.4334
- Ranking Simple: 0.4905
- Reward Accuracy: 0.6291
- Logp Accuracy: 0.4905
- Log Diff Policy: 0.6270
- Chosen Logps: -93.7722
- Rejected Logps: -94.3992
- Chosen Rewards: 0.1566
- Rejected Rewards: 0.0880
- Logits: -1.2008
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
### Training results
| Training Loss | Epoch | Step | Validation Loss | Objective | Ranking Simple | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
|:-------------:|:------:|:----:|:---------------:|:---------:|:--------------:|:---------------:|:-------------:|:---------------:|:------------:|:--------------:|:--------------:|:----------------:|:-------:|
| 0.3806 | 0.3212 | 50 | 0.4419 | 0.4517 | 0.4864 | 0.6264 | 0.4864 | 0.3852 | -96.1308 | -96.5160 | -0.0793 | -0.1237 | -1.2623 |
| 0.3564 | 0.6424 | 100 | 0.4331 | 0.4486 | 0.4932 | 0.6114 | 0.4932 | 0.5639 | -96.4497 | -97.0136 | -0.1111 | -0.1734 | -1.2708 |
| 0.3184 | 0.9636 | 150 | 0.4229 | 0.4380 | 0.4973 | 0.6236 | 0.4973 | 0.6348 | -93.3135 | -93.9483 | 0.2025 | 0.1331 | -1.2473 |
| 0.2504 | 1.2848 | 200 | 0.4181 | 0.4328 | 0.4918 | 0.6454 | 0.4918 | 0.6747 | -93.7666 | -94.4414 | 0.1572 | 0.0838 | -1.2087 |
| 0.2565 | 1.6060 | 250 | 0.4203 | 0.4386 | 0.4946 | 0.6277 | 0.4946 | 0.6352 | -92.0965 | -92.7317 | 0.3242 | 0.2548 | -1.2579 |
| 0.2468 | 1.9272 | 300 | 0.4177 | 0.4317 | 0.4918 | 0.625 | 0.4918 | 0.6116 | -93.9391 | -94.5507 | 0.1399 | 0.0729 | -1.2024 |
| 0.1956 | 2.2484 | 350 | 0.4182 | 0.4315 | 0.4918 | 0.6304 | 0.4918 | 0.6462 | -93.9020 | -94.5482 | 0.1436 | 0.0731 | -1.2089 |
| 0.1909 | 2.5696 | 400 | 0.4186 | 0.4326 | 0.4918 | 0.6359 | 0.4918 | 0.6469 | -93.7824 | -94.4293 | 0.1556 | 0.0850 | -1.1996 |
| 0.1873 | 2.8908 | 450 | 0.4185 | 0.4335 | 0.4918 | 0.6264 | 0.4918 | 0.6265 | -93.7758 | -94.4023 | 0.1562 | 0.0877 | -1.2008 |
### Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1
|