suri-i-orpo / README.md
chtmp223's picture
Update README.md
4c1b998 verified
|
raw
history blame
No virus
2.22 kB
metadata
library_name: peft
base_model: mistralai/Mistral-7B-Instruct-v0.2
license: apache-2.0
language:
  - en

Suri-I-ORPO

Suri-I-ORPO is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 using instructional odds ratio preference optimization (I-ORPO). Please check our paper for more details on the method.

πŸ“’ Model Details

Model Description

Model Sources

⚠️ Getting Started

Use the code in this repository for training and inference.

πŸ’» Training Details

Training Data

chtmp223/suri

Training Procedure

Configurations Values
Hardware (Training and Inference) 4xA100s
Tracking wandb
lora_r 16
lora_alpha 16
lora_dropout 0.05
beta 0.4
gradient_accumulation_steps 1
gradient_checkpointing True
learning_rate 5.0e-5
lr_scheduler_type cosine
max_length 15024
max_completion_length 15000
max_prompt_length 5000
num_train_epochs 2
optim adamw_torch
per_device_train_batch_size 1

πŸ€— Software

Training code is adapted from Alignment Handbook and Trl.

πŸ“œ Citation

TODO

βš™οΈ Framework versions

  • PEFT 0.11.1