Llama-3.2-3B-Instruct Fine-Tuning on Custom Dataset

Overview

This repository demonstrates the process of fine-tuning the Llama-3.2-3B-Instruct model using the Unsloth library. The model is trained on a custom dataset, FineTome-100k, for 60 steps. Key optimizations include:

  • 4-bit quantization to reduce memory usage
  • LoRA (Low-Rank Adaptation) for efficient fine-tuning
  • Techniques for improving inference speed and generating text with the model

Model Details

  • Model Name: Llama-3.2-3B-Instruct
  • Pretrained Weights: Unsloth’s pretrained version for Llama-3.2-3B
  • Quantization: 4-bit quantization (set via load_in_4bit=True) for reduced memory usage

LoRA Configuration:

  • Rank: 16
  • Target Modules:
    • q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • LoRA Alpha: 16
  • LoRA Dropout: 0

Gradient Checkpointing:

  • Use Gradient Checkpointing: "unsloth" for improved long-context training

Training

  • Dataset: FineTome-100k (first 500 records selected)
  • Loss Function: Standard loss for sequence-to-sequence tasks
  • Training Steps: 60 steps with batch size of 2 (gradient accumulation steps set to 4)
  • Optimizer: AdamW 8-bit

Training Parameters:

  • Max Sequence Length: 2048 tokens
  • Learning Rate: 2e-4
  • Gradient Accumulation Steps: 4
  • Total Steps: 60
  • Epochs: 1 (as max_steps was set to 60)
  • Training Precision: Use FP16 or BF16 for training depending on GPU support

Performance

  • GPU Used: Tesla T4 (14.7 GB max memory)

Peak Memory Usage:

  • Total Reserved Memory: 3.855 GB
  • Memory Used for LoRA: 1.312 GB
  • Memory Utilization: 26.1% (peak) of available memory

Conclusion

This notebook showcases an efficient approach to fine-tuning large language models with memory optimizations and improved training efficiency using LoRA and 4-bit quantization. The Unsloth library allows for fast training and inference, making this setup ideal for large-scale tasks even with limited GPU resources.

Notebook

Access the implementation notebook for this model here. This notebook provides detailed steps for fine-tuning and deploying the model.

Downloads last month
89
GGUF
Model size
3.21B params
Architecture
llama

4-bit

5-bit

8-bit

Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for SURESHBEEKHANI/Llama_3_2_3B_SFT_GGUF

Quantized
(44)
this model

Dataset used to train SURESHBEEKHANI/Llama_3_2_3B_SFT_GGUF