illeto
/

finetunning-week2

Text Generation

text-generation-inference

Model card Files Files and versions Community

finetunning-week2 / README.md

illeto's picture

Update README.md

e392e7b verified 8 months ago

|

history blame contribute delete

1.7 kB

	---
	library_name: transformers
	datasets:
	- mlabonne/orpo-dpo-mix-40k
	base_model:
	- meta-llama/Llama-3.2-1B-Instruct
	---
	# ORPO-Tuned Llama2-1B-Instruct

	NB: Done purely as a fine-tuning exercise. Not intedned for any practical use.

	This model is a fine-tuned version of Meta's Llama-3.2-1B-Instruct using ORPO (Optimizing Reward with Policy Optimization). The model was trained to better align with human preferences using a curated preference dataset from mlabonne/orpo-dpo-mix-40k.


	## Model Details

	- Base Model: meta-llama/Llama-3.2-1B-Instruct
	- Training Method: ORPO (Optimizing Reward with Policy Optimization) with LoRA
	- Training Dataset: mlabonne/orpo-dpo-mix-40k (subset of 100 examples)
	- Framework: Hugging Face Transformers, TRL, PEFT
	- Training Date: November 2024
	- License: Same as base model (Llama 2)

	## Training Process

	The model was fine-tuned using LoRA (Low-Rank Adaptation) with the following configuration:

	### LoRA Parameters
	- r=16 (rank)
	- lora_alpha=32
	- lora_dropout=0.05
	- bias="none"
	- task_type="CAUSAL_LM"

	### Training Parameters
	- Learning rate: 1e-5
	- Batch size: 4
	- Gradient accumulation steps: 4
	- Maximum steps: 100
	- Warmup steps: 10
	- Gradient checkpointing: Enabled
	- FP16 training: Enabled
	- Maximum sequence length: 512
	- Maximum prompt length: 512
	- Optimizer: AdamW

	## Evaluation Results

	The model was evaluated on the HellaSwag benchmark with the following configuration:
	- Batch size: 64 (auto-detected)
	- Full evaluation set
	- Zero-shot setting
	- FP16 precision

	Results:
	\| Metric \| Value \| Standard Error \|
	\|--------\|-------\|---------------\|
	\| Accuracy \| 45.20% \| ±0.50% \|
	\| Normalized Accuracy \| 60.78% \| ±0.49% \|