finetunning-week2 / README.md
illeto's picture
Update README.md
e392e7b verified
metadata
library_name: transformers
datasets:
  - mlabonne/orpo-dpo-mix-40k
base_model:
  - meta-llama/Llama-3.2-1B-Instruct

ORPO-Tuned Llama2-1B-Instruct

NB: Done purely as a fine-tuning exercise. Not intedned for any practical use.

This model is a fine-tuned version of Meta's Llama-3.2-1B-Instruct using ORPO (Optimizing Reward with Policy Optimization). The model was trained to better align with human preferences using a curated preference dataset from mlabonne/orpo-dpo-mix-40k.

Model Details

  • Base Model: meta-llama/Llama-3.2-1B-Instruct
  • Training Method: ORPO (Optimizing Reward with Policy Optimization) with LoRA
  • Training Dataset: mlabonne/orpo-dpo-mix-40k (subset of 100 examples)
  • Framework: Hugging Face Transformers, TRL, PEFT
  • Training Date: November 2024
  • License: Same as base model (Llama 2)

Training Process

The model was fine-tuned using LoRA (Low-Rank Adaptation) with the following configuration:

LoRA Parameters

  • r=16 (rank)
  • lora_alpha=32
  • lora_dropout=0.05
  • bias="none"
  • task_type="CAUSAL_LM"

Training Parameters

  • Learning rate: 1e-5
  • Batch size: 4
  • Gradient accumulation steps: 4
  • Maximum steps: 100
  • Warmup steps: 10
  • Gradient checkpointing: Enabled
  • FP16 training: Enabled
  • Maximum sequence length: 512
  • Maximum prompt length: 512
  • Optimizer: AdamW

Evaluation Results

The model was evaluated on the HellaSwag benchmark with the following configuration:

  • Batch size: 64 (auto-detected)
  • Full evaluation set
  • Zero-shot setting
  • FP16 precision

Results:

Metric Value Standard Error
Accuracy 45.20% ±0.50%
Normalized Accuracy 60.78% ±0.49%