--- library_name: peft base_model: mistralai/Mistral-7B-Instruct-v0.2 license: apache-2.0 language: - en --- # Suri-I-ORPO Suri-I-ORPO is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 using instructional odds ratio preference optimization (I-ORPO). Please check [our paper](TODO) for more details on the method. ## 📒 Model Details ### Model Description - **Language(s) (NLP):** English - **License:** Apache-2.0 - **Finetuned from model:** [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) ### Model Sources - **Repository:** [Github repository](https://github.com/chtmp223/suri) -- contains code to reconstruct books3 subset. - **Paper:** TODO - **Demo:** [Website](https://chtmp223.github.io/suri) ## ⚠️ Getting Started Use the code in [this repository](https://github.com/chtmp223/suri) for training and inference. ## 💻 Training Details ### Training Data [chtmp223/suri](https://huggingface.co/datasets/chtmp223/suri) ### Training Procedure | **Configurations** | **Values** | |----------------------------------|--------------| | Hardware (Training and Inference)| 4xA100s | | Tracking | wandb | | lora_r | 16 | | lora_alpha | 16 | | lora_dropout | 0.05 | | beta | 0.4 | | gradient_accumulation_steps | 1 | | gradient_checkpointing | True | | learning_rate | 5.0e-5 | | lr_scheduler_type | cosine | | max_length | 15024 | | max_completion_length | 15000 | | max_prompt_length | 5000 | | num_train_epochs | 2 | | optim | adamw_torch | | per_device_train_batch_size | 1 | #### 🤗 Software Training code is adapted from [Alignment Handbook](https://github.com/huggingface/alignment-handbook) and [Trl](https://github.com/huggingface/trl). ## 📜 Citation ``` TODO ``` ### ⚙️ Framework versions - PEFT 0.11.1