Model description
This model is an ORPO fine-tuned version of the mistralai/Mistral-7B-v0.3 on 2.5k subsamples of the mlabonne/orpo-dpo-mix-40k dataset. Thanks to Maxime Labonne for providing this amazing guide on Odds Ratio Policy Optimization (ORPO). ORPO combines the traditional supervised fine-tuning and preference alignment stages into a single process.
This model follows the ChatML chat template!
How to use
import torch
from transformers import AutoTokenizer, pipeline
model_id = "MuntasirHossain/Orpo-Mistral-7B-v0.3"
tokenizer = AutoTokenizer.from_pretrained(model_id)
llm = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.float16,
device_map="auto",
)
def generate(input_text):
messages = [{"role": "user", "content": input_text}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = llm(prompt, max_new_tokens=512,)
return outputs[0]["generated_text"][len(prompt):]
generate("Explain quantum tunneling in simple terms.")
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.