|
--- |
|
base_model: wenqiglantz/MistralTrinity-7B-slerp |
|
tags: |
|
- mistral |
|
- instruct |
|
- finetune |
|
- chatml |
|
- synthetic data |
|
- distillation |
|
- dpo |
|
- rlhf |
|
license: apache-2.0 |
|
language: |
|
- en |
|
datasets: |
|
- mlabonne/chatml_dpo_pairs |
|
--- |
|
|
|
# MistralTrinity-7B-slerp-dpo |
|
|
|
Inspired by @mlabonne's blog post [Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac), this model was fine-tuned with DPO (Direct Preference Optimization) on base model `MistralTrinity-7B-slerp`, which is a merged model for `mistralai/Mistral-7B-Instruct-v0.2` and `jan-hq/trinity-v1`, using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. |
|
|
|
The code to train this model is available on [Google Colab](https://colab.research.google.com/github/wenqiglantz/llmops/blob/main/Fine_tune_MistralTrinity_7B_slerp_with_DPO.ipynb) and [GitHub](https://github.com/wenqiglantz/llmops/blob/main/Fine_tune_MistralTrinity_7B_slerp_with_DPO.ipynb). |
|
|
|
It required an A100 GPU for over an hour. |
|
|
|
Check out fine-tuning run details on [Weights & Biases](https://wandb.ai/wenqiglantz/huggingface/runs/sxbgd33f). |
|
|