Model description

This model is an ORPO fine-tuned version of the mistralai/Mistral-7B-v0.3 on 2.5k subsamples of the mlabonne/orpo-dpo-mix-40k dataset. Thanks to Maxime Labonne for providing this amazing guide on Odds Ratio Policy Optimization (ORPO). ORPO combines the traditional supervised fine-tuning and preference alignment stages into a single process.

This model follows the ChatML chat template!

How to use

import torch
from transformers import AutoTokenizer, pipeline

model_id = "MuntasirHossain/Orpo-Mistral-7B-v0.3"
tokenizer = AutoTokenizer.from_pretrained(model_id)

llm = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

def generate(input_text):
  messages = [{"role": "user", "content": input_text}]
  prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
  outputs = llm(prompt, max_new_tokens=512,)
  return outputs[0]["generated_text"][len(prompt):]

generate("Explain quantum tunneling in simple terms.")
Downloads last month
16
Safetensors
Model size
7.25B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train MuntasirHossain/Orpo-Mistral-7B-v0.3