ORPO-Tuned Llama2-1B-Instruct
NB: Done purely as a fine-tuning exercise. Not intedned for any practical use.
This model is a fine-tuned version of Meta's Llama-3.2-1B-Instruct using ORPO (Optimizing Reward with Policy Optimization). The model was trained to better align with human preferences using a curated preference dataset from mlabonne/orpo-dpo-mix-40k.
Model Details
- Base Model: meta-llama/Llama-3.2-1B-Instruct
- Training Method: ORPO (Optimizing Reward with Policy Optimization) with LoRA
- Training Dataset: mlabonne/orpo-dpo-mix-40k (subset of 100 examples)
- Framework: Hugging Face Transformers, TRL, PEFT
- Training Date: November 2024
- License: Same as base model (Llama 2)
Training Process
The model was fine-tuned using LoRA (Low-Rank Adaptation) with the following configuration:
LoRA Parameters
- r=16 (rank)
- lora_alpha=32
- lora_dropout=0.05
- bias="none"
- task_type="CAUSAL_LM"
Training Parameters
- Learning rate: 1e-5
- Batch size: 4
- Gradient accumulation steps: 4
- Maximum steps: 100
- Warmup steps: 10
- Gradient checkpointing: Enabled
- FP16 training: Enabled
- Maximum sequence length: 512
- Maximum prompt length: 512
- Optimizer: AdamW
Evaluation Results
The model was evaluated on the HellaSwag benchmark with the following configuration:
- Batch size: 64 (auto-detected)
- Full evaluation set
- Zero-shot setting
- FP16 precision
Results:
Metric | Value | Standard Error |
---|---|---|
Accuracy | 45.20% | ±0.50% |
Normalized Accuracy | 60.78% | ±0.49% |
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for illeto/finetunning-week2
Base model
meta-llama/Llama-3.2-1B-Instruct