README for Gemma-2-2B-IT Fine-Tuning with LoRA

This project fine-tunes the Gemma-2-2B-IT model using LoRA (Low-Rank Adaptation) for Question Answering tasks, leveraging the Wikitext-2 dataset. The fine-tuning process is optimized for efficient training on limited GPU memory by freezing most model parameters and applying LoRA to specific layers.

Project Overview

  • Model: Gemma-2-2B-IT, a causal language model.
  • Dataset: Wikitext-2 for text generation and causal language modeling.
  • Training Strategy: LoRA adaptation for low-resource fine-tuning.
  • Frameworks: Hugging Face transformers, peft, and datasets.

Key Features

  • LoRA Configuration:
    • LoRA is applied to the following projection layers: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj.
    • LoRA hyperparameters:
      • Rank (r): 4
      • LoRA Alpha: 8
      • Dropout: 0.1
  • Training Configuration:
    • Mixed precision (fp16) enabled for faster and more memory-efficient training.
    • Gradient accumulation with 32 steps to manage large model sizes on small GPUs.
    • Batch size of 1 due to GPU memory constraints.
    • Learning rate: 5e-5 with weight decay: 0.01.

System Requirements

  • GPU: Required for efficient training. This script was tested with CUDA-enabled GPUs.
  • Python Packages: Install dependencies with:
    pip install -r requirements.txt
    

Notes

  • This fine-tuned model leverages LoRA to adapt the large Gemma-2-2B-IT model with minimal trainable parameters, allowing fine-tuning even on hardware with limited memory.
  • The fine-tuned model can be further utilized for tasks like Question Answering, and it is optimized for resource-efficient deployment.

Memory Usage

  • The training script includes CUDA memory summaries before and after the training process to monitor GPU memory consumption.
Downloads last month
11
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Space using halyn/gemma2-2b-it-finetuned-paperqa 1