Uploaded finetuned model
- Developed by: Haq Nawaz Malik
- License: apache-2.0
- Finetuned from model : unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
Documentation: Hnm_Llama3.2_(11B)-Vision_lora_model
Overview
The Hnm_Llama3.2_(11B)-Vision_lora_model is a fine-tuned version of Llama 3.2 (11B) Vision with LoRA-based parameter-efficient fine-tuning (PEFT). It specializes in vision-language tasks, particularly for medical image captioning and understanding.
This model was fine-tuned on a Tesla T4 (Google Colab) using Unsloth, a framework designed for efficient fine-tuning of large models.
Features
- Fine-tuned on Radiology Images: Trained using the Radiology_mini dataset.
- Supports Image Captioning: Can describe medical images.
- 4-bit Quantization (QLoRA): Memory efficient, runs on consumer GPUs.
- LoRA-based PEFT: Trains only 1% of parameters, significantly reducing computational cost.
- Multi-modal Capabilities: Works with both text and image inputs.
- Supports both Vision and Language fine-tuning.
Model Details
- Base Model:
unsloth/Llama-3.2-11B-Vision-Instruct
- Fine-tuning Method: LoRA + 4-bit Quantization (QLoRA)
- Dataset:
unsloth/Radiology_mini
- Framework: Unsloth + Hugging Face Transformers
- Training Environment: Google Colab (Tesla T4 GPU)
2. Load the Model
from unsloth import FastVisionModel
model, tokenizer = FastVisionModel.from_pretrained(
"Hnm_Llama3.2_(11B)-Vision_lora_model",
load_in_4bit=True # Set to False for full precision
)
Usage
1. Image Captioning Example
import torch
from transformers import TextStreamer
FastVisionModel.for_inference(model) # Enable inference mode
# Load an image from dataset
dataset = load_dataset("unsloth/Radiology_mini", split="train")
image = dataset[0]["image"]
instruction = "Describe this medical image accurately."
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": instruction}
]}
]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(
image,
input_text,
add_special_tokens=False,
return_tensors="pt"
).to("cuda")
text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128,
use_cache=True, temperature=1.5, min_p=0.1)
Notes
- This model is optimized for vision-language tasks in the medical field but can be adapted for other applications.
- Uses LoRA adapters, meaning you can fine-tune it efficiently with very few GPU resources.
- Supports Hugging Face Model Hub for deployment and sharing.
Citation
If you use this model, please cite:
@misc{Hnm_Llama3.2_11B_Vision,
author = {Haq Nawaz Malik},
title = {Fine-tuned Llama 3.2 (11B) Vision Model},
year = {2025},
url = {https://huggingface.co/Omarrran/Hnm_Llama3_2_Vision_lora_model}
}
Contact
For any questions or support, reach out via:
- Downloads last month
- 3
Inference API (serverless) does not yet support transformers models for this pipeline type.