wenqiglantz commited on
Commit
abd495e
1 Parent(s): 2147e5a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: wenqiglantz/MistralTrinity-7B-slerp
3
+ tags:
4
+ - mistral
5
+ - instruct
6
+ - finetune
7
+ - chatml
8
+ - synthetic data
9
+ - distillation
10
+ - dpo
11
+ - rlhf
12
+ license: apache-2.0
13
+ language:
14
+ - en
15
+ datasets:
16
+ - mlabonne/chatml_dpo_pairs
17
+ ---
18
+
19
+ # MistralTrinity-7B-slerp-dpo
20
+
21
+ Inspired by @mlabonne's blog post [Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac), this model was fine-tuned with DPO (Direct Preference Optimization) on base model `MistralTrinity-7B-slerp`, which is a merged model for `mistralai/Mistral-7B-Instruct-v0.2` and `jan-hq/trinity-v1`, using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset.
22
+
23
+ The code to train this model is available on [Google Colab](https://colab.research.google.com/github/wenqiglantz/llmops/blob/main/Fine_tune_MistralTrinity_7B_slerp_with_DPO.ipynb) and [GitHub](https://github.com/wenqiglantz/llmops/blob/main/Fine_tune_MistralTrinity_7B_slerp_with_DPO.ipynb).
24
+
25
+ It required an A100 GPU for over an hour.
26
+
27
+ Check out fine-tuning run details on [Weights & Biases](https://wandb.ai/wenqiglantz/huggingface/runs/sxbgd33f).