ContextualAI
/

Contextual_KTO_Mistral_PairRM

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

xwinxu commited on Mar 7, 2024

Commit

bdf7fe0

·

verified ·

1 Parent(s): 31efc9a

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ This repo contains the model and tokenizer checkpoints for:
 - model family [<b>mistralai/Mistral-7B-Instruct-v0.2</b>](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
 - optimized with the loss [<b>KTO</b>](https://twitter.com/winniethexu/status/1732839295365554643)
 - aligned using the [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset)
-- via 3 iterations of KTO on one epoch of each training partition, each previous iteration's model serving as the reference for the subsequeent.
 **[03/06/2024]**: We are #2 on the (verified) [Alpaca Eval 2.0 Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) scoring **33.23**!
@@ -38,9 +38,8 @@ What kind of cake?
 Chocolate cake.
 <|assistant|>
 ```
-Note that a beginning-of-sequence (BOS) token automatically added at tokenization and does not have to be added by you. No end-of-sequence (EOS) token is added to the prompt.
-You may also use our tokenizer to `apply_chat_template` if doing inference with `chatml` set or evaluation through non-local clients.
 Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) for more details on the methodology.

 - model family [<b>mistralai/Mistral-7B-Instruct-v0.2</b>](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
 - optimized with the loss [<b>KTO</b>](https://twitter.com/winniethexu/status/1732839295365554643)
 - aligned using the [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset)
+- via 3 iterations of KTO on one epoch of each training partition, each previous iteration's model serving as the reference for the subsequent.
 **[03/06/2024]**: We are #2 on the (verified) [Alpaca Eval 2.0 Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) scoring **33.23**!
 Chocolate cake.
 <|assistant|>
 ```
+Note that a beginning-of-sequence (BOS) token is automatically added at tokenization time and does not have to be added by you. No end-of-sequence (EOS) token is added to the prompt.
+You may also use our tokenizer's `apply_chat_template` if doing inference with `chatml` set or evaluating generations through non-local clients.
 Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) for more details on the methodology.