wenqiglantz
commited on
Commit
•
abd495e
1
Parent(s):
2147e5a
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: wenqiglantz/MistralTrinity-7B-slerp
|
3 |
+
tags:
|
4 |
+
- mistral
|
5 |
+
- instruct
|
6 |
+
- finetune
|
7 |
+
- chatml
|
8 |
+
- synthetic data
|
9 |
+
- distillation
|
10 |
+
- dpo
|
11 |
+
- rlhf
|
12 |
+
license: apache-2.0
|
13 |
+
language:
|
14 |
+
- en
|
15 |
+
datasets:
|
16 |
+
- mlabonne/chatml_dpo_pairs
|
17 |
+
---
|
18 |
+
|
19 |
+
# MistralTrinity-7B-slerp-dpo
|
20 |
+
|
21 |
+
Inspired by @mlabonne's blog post [Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac), this model was fine-tuned with DPO (Direct Preference Optimization) on base model `MistralTrinity-7B-slerp`, which is a merged model for `mistralai/Mistral-7B-Instruct-v0.2` and `jan-hq/trinity-v1`, using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset.
|
22 |
+
|
23 |
+
The code to train this model is available on [Google Colab](https://colab.research.google.com/github/wenqiglantz/llmops/blob/main/Fine_tune_MistralTrinity_7B_slerp_with_DPO.ipynb) and [GitHub](https://github.com/wenqiglantz/llmops/blob/main/Fine_tune_MistralTrinity_7B_slerp_with_DPO.ipynb).
|
24 |
+
|
25 |
+
It required an A100 GPU for over an hour.
|
26 |
+
|
27 |
+
Check out fine-tuning run details on [Weights & Biases](https://wandb.ai/wenqiglantz/huggingface/runs/sxbgd33f).
|