---
base_model: unsloth/Llama-3.2-11B-Vision-Instruct
tags:
- text-generation-inference
- transformers
- unsloth
- mllama
license: apache-2.0
language:
- en
model-index:
- name: DocumentCogito
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: wis-k/instruction-following-eval
      split: train
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 50.64
      name: averaged accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: SaylorTwift/bbh
      split: test
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 29.79
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: lighteval/MATH-Hard
      split: test
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 16.24
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      split: train
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 8.84
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 8.6
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 31.14
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FDocumentCogito
      name: Open LLM Leaderboard
---

# **unsloth/Llama-3.2-11B-Vision-Instruct (Fine-Tuned)**

## **Model Overview**
This model, fine-tuned from the `unsloth/Llama-3.2-11B-Vision-Instruct` base, is optimized for vision-language tasks with enhanced instruction-following capabilities. Fine-tuning was completed 2x faster using the [Unsloth](https://github.com/unslothai/unsloth) framework combined with Hugging Face's TRL library, ensuring efficient training while maintaining high performance.

## **Key Information**
- **Developed by:** Daemontatox  
- **Base Model:** `unsloth/Llama-3.2-11B-Vision-Instruct`  
- **License:** Apache-2.0  
- **Language:** English (`en`)  
- **Frameworks Used:** Hugging Face Transformers, Unsloth, and TRL  

## **Performance and Use Cases**
This model is ideal for applications involving:  
- Vision-based text generation and description tasks  
- Instruction-following in multimodal contexts  
- General-purpose text generation with enhanced reasoning

### **Features**
- **2x Faster Training:** Leveraging the Unsloth framework for accelerated fine-tuning.  
- **Multimodal Capabilities:** Enhanced to handle vision-language interactions.  
- **Instruction Optimization:** Tailored for improved comprehension and execution of instructions.


## **How to Use**

### **Inference Example (Hugging Face Transformers)**

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Daemontatox/finetuned-llama-3.2-vision-instruct")
model = AutoModelForCausalLM.from_pretrained("Daemontatox/finetuned-llama-3.2-vision-instruct")

input_text = "Describe the image showing a sunset over mountains."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__DocumentCogito-details)!
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox%2FDocumentCogito&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!

|      Metric       |Value (%)|
|-------------------|--------:|
|**Average**        |    24.21|
|IFEval (0-Shot)    |    50.64|
|BBH (3-Shot)       |    29.79|
|MATH Lvl 5 (4-Shot)|    16.24|
|GPQA (0-shot)      |     8.84|
|MuSR (0-shot)      |     8.60|
|MMLU-PRO (5-shot)  |    31.14|