File size: 2,663 Bytes
2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b 2c9633f 1d1c49b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
license: mit
library_name: "trl"
tags:
- DPO
- DPO
base_model: Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT
model-index:
- name: Weni/WeniGPT-2.6.3-Zephyr-7B-zephyr-prompt-LLM_Base_2.0.3_DPO_reduction_variation
results: []
language: ['pt']
---
# Weni/WeniGPT-2.6.3-Zephyr-7B-zephyr-prompt-LLM_Base_2.0.3_DPO_reduction_variation
This model is a fine-tuned version of [Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT] on the dataset Weni/LLM_Base_2.0.3_DPO with the DPO trainer. It is part of the DPO project for [Weni](https://weni.ai/).
It achieves the following results on the evaluation set:
{'eval_loss': 0.6931472420692444, 'eval_runtime': 175.1355, 'eval_samples_per_second': 2.804, 'eval_steps_per_second': 1.405, 'eval_rewards/chosen': 0.0, 'eval_rewards/rejected': 0.0, 'eval_rewards/accuracies': 0.0, 'eval_rewards/margins': 0.0, 'eval_logps/rejected': -206.18580627441406, 'eval_logps/chosen': -64.04271697998047, 'eval_logits/rejected': -2.028987169265747, 'eval_logits/chosen': -1.6491303443908691, 'epoch': 0.0}
## Intended uses & limitations
This model has not been trained to avoid specific intructions.
## Training procedure
Finetuning was done on the model Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT with the following prompt:
```
Question:
<|user|>{question}</s>
Chosen:
<|assistant|>{correct_ans}</s>
Rejected:
<|assistant|>{rejected_ans}</s>
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- per_device_train_batch_size: 8
- per_device_eval_batch_size: 2
- gradient_accumulation_steps: 2
- num_gpus: 1
- total_train_batch_size: 16
- optimizer: AdamW
- lr_scheduler_type: cosine
- num_steps: 1
- quantization_type: bitsandbytes
- LoRA: ("\n - bits: 4\n - use_exllama: True\n - device_map: auto\n - use_cache: False\n - lora_r: 8\n - lora_alpha: 16\n - lora_dropout: 0.1\n - bias: none\n - target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']\n - task_type: CAUSAL_LM",)
### Training results
### Framework versions
- git+https://github.com/huggingface/transformers@main
- datasets==2.17.1
- peft==0.8.2
- safetensors==0.4.2
- evaluate==0.4.1
- bitsandbytes==0.42
- huggingface_hub==0.20.3
- seqeval==1.2.2
- optimum==1.17.1
- auto-gptq==0.7.0
- gpustat==1.1.1
- deepspeed==0.13.2
- wandb==0.16.3
- git+https://github.com/huggingface/trl.git@main
- git+https://github.com/huggingface/accelerate.git@main
- coloredlogs==15.0.1
- traitlets==5.14.1
- autoawq@https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.0/autoawq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl
### Hardware
- Cloud provided: runpod.io
|