File size: 4,291 Bytes

---
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-25-1
tags:
- alignment-handbook
- ndcg
- trl
- expo
- generated_from_trainer
- trl
- expo
- generated_from_trainer
datasets:
- hZzy/train_pairwise_strong_new
model-index:
- name: qwen2.5-0.5b-expo-L2EXPO-25-3
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/vdeeps75)
# qwen2.5-0.5b-expo-L2EXPO-25-3

This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-25-1](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-25-1) on the hZzy/train_pairwise_strong_new dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4184
- Objective: 0.4334
- Ranking Simple: 0.4905
- Reward Accuracy: 0.6291
- Logp Accuracy: 0.4905
- Log Diff Policy: 0.6270
- Chosen Logps: -93.7722
- Rejected Logps: -94.3992
- Chosen Rewards: 0.1566
- Rejected Rewards: 0.0880
- Logits: -1.2008

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Objective | Ranking Simple | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits  |
|:-------------:|:------:|:----:|:---------------:|:---------:|:--------------:|:---------------:|:-------------:|:---------------:|:------------:|:--------------:|:--------------:|:----------------:|:-------:|
| 0.3806        | 0.3212 | 50   | 0.4419          | 0.4517    | 0.4864         | 0.6264          | 0.4864        | 0.3852          | -96.1308     | -96.5160       | -0.0793        | -0.1237          | -1.2623 |
| 0.3564        | 0.6424 | 100  | 0.4331          | 0.4486    | 0.4932         | 0.6114          | 0.4932        | 0.5639          | -96.4497     | -97.0136       | -0.1111        | -0.1734          | -1.2708 |
| 0.3184        | 0.9636 | 150  | 0.4229          | 0.4380    | 0.4973         | 0.6236          | 0.4973        | 0.6348          | -93.3135     | -93.9483       | 0.2025         | 0.1331           | -1.2473 |
| 0.2504        | 1.2848 | 200  | 0.4181          | 0.4328    | 0.4918         | 0.6454          | 0.4918        | 0.6747          | -93.7666     | -94.4414       | 0.1572         | 0.0838           | -1.2087 |
| 0.2565        | 1.6060 | 250  | 0.4203          | 0.4386    | 0.4946         | 0.6277          | 0.4946        | 0.6352          | -92.0965     | -92.7317       | 0.3242         | 0.2548           | -1.2579 |
| 0.2468        | 1.9272 | 300  | 0.4177          | 0.4317    | 0.4918         | 0.625           | 0.4918        | 0.6116          | -93.9391     | -94.5507       | 0.1399         | 0.0729           | -1.2024 |
| 0.1956        | 2.2484 | 350  | 0.4182          | 0.4315    | 0.4918         | 0.6304          | 0.4918        | 0.6462          | -93.9020     | -94.5482       | 0.1436         | 0.0731           | -1.2089 |
| 0.1909        | 2.5696 | 400  | 0.4186          | 0.4326    | 0.4918         | 0.6359          | 0.4918        | 0.6469          | -93.7824     | -94.4293       | 0.1556         | 0.0850           | -1.1996 |
| 0.1873        | 2.8908 | 450  | 0.4185          | 0.4335    | 0.4918         | 0.6264          | 0.4918        | 0.6265          | -93.7758     | -94.4023       | 0.1562         | 0.0877           | -1.2008 |


### Framework versions

- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1