metadata
base_model: wenqiglantz/MistralTrinity-7B-slerp
tags:
- mistral
- instruct
- finetune
- chatml
- synthetic data
- distillation
- dpo
- rlhf
license: apache-2.0
language:
- en
datasets:
- mlabonne/chatml_dpo_pairs
MistralTrinity-7B-slerp-dpo
Inspired by @mlabonne's blog post Fine-tune a Mistral-7b model with Direct Preference Optimization, this model was fine-tuned with DPO (Direct Preference Optimization) on base model MistralTrinity-7B-slerp
, which is a merged model for mistralai/Mistral-7B-Instruct-v0.2
and jan-hq/trinity-v1
, using the mlabonne/chatml_dpo_pairs dataset.
The code to train this model is available on Google Colab and GitHub.
It required an A100 GPU for over an hour.
Check out fine-tuning run details on Weights & Biases.