Orpo-Llama-3.2-1B-15k
AdamLucek/Orpo-Llama-3.2-1B-15k is an ORPO fine tuned version of meta-llama/Llama-3.2-1B on a subset of 15,000 shuffled entries of mlabonne/orpo-dpo-mix-40k.
Trained for 7 hours on an L4 GPU with this training script, modified from Maxime Labonne's original guide
For full model details, refer to the base model page meta-llama/Llama-3.2-1B
Evaluations
In comparsion to AdamLucek/Orpo-Llama-3.2-1B-40k using lm-evaluation-harness.
Benchmark | 15k Accuracy | 15k Normalized | 40k Accuracy | 40k Normalized | Notes |
---|---|---|---|---|---|
AGIEval | 22.14% | 21.01% | 23.57% | 23.26% | 0-Shot Average across multiple reasoning tasks |
GPT4ALL | 51.15% | 54.38% | 51.63% | 55.00% | 0-Shot Average across all categories |
TruthfulQA | 42.79% | N/A | 42.14% | N/A | MC2 accuracy |
MMLU | 31.22% | N/A | 31.01% | N/A | 5-Shot Average across all categories |
Winogrande | 61.72% | N/A | 61.12% | N/A | 0-shot evaluation |
ARC Challenge | 32.94% | 36.01% | 33.36% | 37.63% | 0-shot evaluation |
ARC Easy | 64.52% | 60.40% | 65.91% | 60.90% | 0-shot evaluation |
BoolQ | 50.24% | N/A | 52.29% | N/A | 0-shot evaluation |
PIQA | 75.46% | 74.37% | 75.63% | 75.19% | 0-shot evaluation |
HellaSwag | 48.56% | 64.71% | 48.46% | 64.50% | 0-shot evaluation |
Using this Model
from transformers import AutoTokenizer
import transformers
import torch
# Load Model and Pipeline
model = "AdamLucek/Orpo-Llama-3.2-1B-15k"
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model)
# Generate Message
messages = [{"role": "user", "content": "What is a language model?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Training Statistics
- Downloads last month
- 224
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.