metadata

library_name: peft
base_model: mistralai/Mistral-7B-Instruct-v0.2
license: apache-2.0
language:
  - en

Suri-I-ORPO

Suri-I-ORPO is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 using instructional odds ratio preference optimization (I-ORPO). Please check our paper for more details on the method.

📒 Model Details

Model Description

Language(s) (NLP): English
License: Apache-2.0
Finetuned from model: mistralai/Mistral-7B-Instruct-v0.2

Model Sources

Repository: Github repository -- contains code to reconstruct books3 subset.
Paper: TODO
Demo: Website

⚠️ Getting Started

Use the code in this repository for training and inference.

💻 Training Details

Training Data

chtmp223/suri

Training Procedure

Configurations	Values
Hardware (Training and Inference)	4xA100s
Tracking	wandb
lora_r	16
lora_alpha	16
lora_dropout	0.05
beta	0.4
gradient_accumulation_steps	1
gradient_checkpointing	True
learning_rate	5.0e-5
lr_scheduler_type	cosine
max_length	15024
max_completion_length	15000
max_prompt_length	5000
num_train_epochs	2
optim	adamw_torch
per_device_train_batch_size	1

🤗 Software

Training code is adapted from Alignment Handbook and Trl.

📜 Citation

TODO

⚙️ Framework versions

PEFT 0.11.1