frankmorales2020/lora_fine_tuned_phi-4_quantized_vision

# lora_fine_tuned_phi-4_quantized_vision

This repository contains a fine-tuned version of the **Phi-4** language model specifically adapted for **image-to-text generation**. 

The model has been fine-tuned using **LoRA (Low-Rank Adaptation)** on the **FGVC Aircraft** dataset, which consists of images of aircraft with corresponding textual descriptions. This fine-tuning process enables the model to generate more accurate and descriptive captions for aircraft images.

**Key Features:**

* **4-bit Quantization:** The model utilizes 4-bit quantization techniques to reduce its size and memory footprint, making it more efficient to deploy and use.
* **LoRA:**  Fine-tuning is performed with LoRA, which allows for efficient adaptation of the model while keeping the number of trainable parameters low.
* **Image Captioning:** The model is specifically trained to generate textual descriptions (captions) for images of aircraft.

**Intended Use Cases:**

* **Image Captioning:** Generate descriptive captions for aircraft images.
* **Aircraft Recognition:** Assist in identifying different types of aircraft based on their visual features.
* **Educational Purposes:**  Used as a tool for learning about different aircraft models.

**How to Use:**

You can use this model directly from Hugging Face Transformers:

```python
from transformers import pipeline, AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM
from peft import PeftModel

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("frankmorales2020/lora_fine_tuned_phi-4_quantized_vision")

# Load the base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/phi-4",
    quantization_config=bnb_config,
    low_cpu_mem_usage=True
)

# Load the locally fine-tuned model with LoRA adapter
model = PeftModel.from_pretrained(
    base_model,  # Pass the base model instance
    "frankmorales2020/lora_fine_tuned_phi-4_quantized_vision",  # Load from HF Hub
    device_map={"": 0},
)

# Set the pad_token_id for the model explicitly
model.generation_config.pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id
tokenizer.pad_token = tokenizer.eos_token
model.pad_token_id = model.config.eos_token_id

# Create a text generation pipeline
generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)

# Generate captions for an image (replace with your image processing logic)
image_path = "path/to/your/aircraft/image.jpg" 
# ... (Add your image loading and preprocessing code here) ...

prompt = f"Generate a caption for the following image: {processed_image}"
generated_caption = generator(prompt, max_length=64)[0]['generated_text']
print(generated_caption)

Training Data:

The model was trained on the FGVC Aircraft dataset (https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/).

Evaluation:

The model was evaluated using the BLEU metric on a held-out test set from the FGVC Aircraft dataset.

Limitations:

The model is specifically fine-tuned for aircraft images and may not generalize well to other types of images.
The generated captions may sometimes be overly generic or lack fine-grained details.

Future Work:

Fine-tune the model on a larger and more diverse dataset of images.
Explore more advanced image encoding techniques to improve the model's understanding of visual features.
Experiment with different decoding strategies to generate more detailed and human-like captions.

Acknowledgements:

This work is based on the Phi-4 language model developed by Microsoft and utilizes the Hugging Face Transformers and Datasets libraries.


**Remember to:**

* Replace `"path/to/your/aircraft/image.jpg"` with the actual path to your image.
* Add your image loading and preprocessing code in the designated section.
* Consider adding a license (e.g., MIT License) to your repository.