Update README.md
Browse files
README.md
CHANGED
@@ -19,8 +19,8 @@ metrics:
|
|
19 |
|
20 |
|
21 |
This repo contains the model and tokenizer checkpoints for:
|
22 |
-
- model family <b>mistralai/Mistral-7B-Instruct-v0.2</b>
|
23 |
-
- optimized with the loss <b>KTO</b>
|
24 |
- aligned using the [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset)
|
25 |
- via 3 iterations of KTO on one epoch of each training partition.
|
26 |
|
@@ -42,7 +42,7 @@ You may also use our tokenizer to `apply_chat_template` if doing inference with
|
|
42 |
|
43 |
|
44 |
|
45 |
-
Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) for more
|
46 |
|
47 |
If you found this work useful, feel free to cite [our work](https://arxiv.org/abs/2402.01306):
|
48 |
```
|
|
|
19 |
|
20 |
|
21 |
This repo contains the model and tokenizer checkpoints for:
|
22 |
+
- model family [<b>mistralai/Mistral-7B-Instruct-v0.2</b>](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
|
23 |
+
- optimized with the loss [<b>KTO</b>](https://twitter.com/winniethexu/status/1732839295365554643)
|
24 |
- aligned using the [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset)
|
25 |
- via 3 iterations of KTO on one epoch of each training partition.
|
26 |
|
|
|
42 |
|
43 |
|
44 |
|
45 |
+
Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) for more details on the methodology.
|
46 |
|
47 |
If you found this work useful, feel free to cite [our work](https://arxiv.org/abs/2402.01306):
|
48 |
```
|