vandeju commited on
Commit
b0796d0
1 Parent(s): 6815673

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -7
README.md CHANGED
@@ -28,9 +28,7 @@ This model is a fine-tuned version of TODO on [ReBatch/ultrafeedback_nl](https:/
28
 
29
  ## Model description
30
 
31
- This model is a Dutch chat model, originally developed from Mistral 7B v0.3 Instruct and further finetuned with QLoRA. First with SFT on a chat dataset and then with a DPO on a feedback Chat dataset.
32
-
33
-
34
  ## Intended uses & limitations
35
 
36
  This model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk.
@@ -39,11 +37,11 @@ Use with Mistral's chat template (can be found in the tokenizer).
39
  ## Training procedure
40
 
41
 
42
- This model was trained with QLoRa in bfloat16 with flash attention 2 on oen A100 PCIe; with the DPO script from the [alignment handbook](https://github.com/huggingface/alignment-handbook/) on [RunPod](https://www.runpod.io/).
43
 
44
  ## Evaluation results
45
 
46
- The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/). There are improvements in 4/7 benchmarks compared to the Mistral-7B-v0.3-Instruct model it was based on.
47
 
48
  | Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl |
49
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:
@@ -77,5 +75,5 @@ The following hyperparameters were used during training:
77
 
78
  ## Model Developer
79
 
80
- The Mistral-7B-v0.3-Instruct model this model is based on is created by [Mistral AI](https://huggingface.co/mistralai).
81
- The finetuning was done by [Julien Van den Avenne](https://huggingface.co/vandeju)
 
28
 
29
  ## Model description
30
 
31
+ This model is a Dutch chat model, originally developed from Mistral 7B v0.3 Instruct and further fine-tuned with QLoRA. It was first fine-tuned with SFT on a chat dataset and then with DPO on a feedback chat dataset.
 
 
32
  ## Intended uses & limitations
33
 
34
  This model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk.
 
37
  ## Training procedure
38
 
39
 
40
+ This model was trained with QLoRa in bfloat16 with Flash Attention 2 on one A100 PCIe, using the DPO script from the [alignment handbook](https://github.com/huggingface/alignment-handbook/) on [RunPod](https://www.runpod.io/).
41
 
42
  ## Evaluation results
43
 
44
+ The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/). There are improvements in 4 out of 7 benchmarks compared to the Mistral-7B-v0.3-Instruct model on which it is based.
45
 
46
  | Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl |
47
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:
 
75
 
76
  ## Model Developer
77
 
78
+ The Mistral-7B-v0.3-Instruct model, on which this model is based, was created by [Mistral AI](https://huggingface.co/mistralai).
79
+ The finetuning was done by [Julien Van den Avenne](https://huggingface.co/vandeju).