Qari-OCR-Arabic-0.2.2.1-VL-2B-Instruct Model

Model Overview

This model is a fine-tuned version of unsloth/Qwen2-VL-2B-Instruct on an Arabic OCR dataset. It is optimized to perform Arabic Optical Character Recognition (OCR) for full-page text.

Key Features

Superior Accuracy: Achieves state-of-the-art performance metrics for Arabic OCR
Diacritics Support: Full recognition of Arabic diacritical marks (tashkeel) including fatḥah, kasrah, ḍammah, sukūn, shadda, and tanwin forms - a strength confirmed by evaluation on a primarily diacritical text dataset
Multiple Font Support: Works across a variety of Arabic font styles
Layout Flexibility: Handles different document layouts and formats

Model Details

Base Model: Qwen2 VL
Fine-tuning Dataset: Arabic OCR dataset
Objective: Extract full-page Arabic text with high accuracy
Languages: Arabic
Tasks: OCR (Optical Character Recognition)
Dataset size: 50,000 records
Epochs: 1

Evaluation Metrics

Performance is evaluated using three standard metrics:

Word Error Rate (WER): Measures word-level accuracy (lower is better)
Character Error Rate (CER): Measures character-level accuracy (lower is better)
BLEU Score: Measures overall translation quality (higher is better)

Results

Model	WER ↓	CER ↓	BLEU ↑
Qari-OCR-0.2.2.1-VL-2B-Instruct	0.221	0.059	0.597
AIN 8B	0.757	0.309	0.103
Qari-OCR-0.1-VL-2B-Instruct	1.294	0.770	0.022
easyOCR	1.004	0.648	0.005
pytesseract	0.990	0.911	<0.001

WER Comparison

CER Comparison

BLEU Score Comparison

Model Details

Training Data

The model was trained using the following specifications:

Font Sizes: 14, 16, 18, 20, 24, 32, 40 pt
Page Layouts:
- A4 (210mm × 297mm)
- Letter (216mm × 279mm)
- Small (105mm × 148mm)
- Square (1080px × 1080px)
- OneLine (210mm × 10mm)
Arabic Fonts Used:
- IBM Plex Sans Arabic
- KFGQPCUthman Taha Naskh
- Scheherazade New
- Amiri
- Madina
- Diwani Letter
- Tajawal
- Cairo
- Lateef
- Almarai
- AlQalam Quran
- Noto Naskh Arabic

Limitations

Based on the training specifications, the model has the following limitations:

Font Size Constraints: May have reduced accuracy with very small (< 14pt) or very large (> 40pt) text
Font Coverage: Performance may degrade on uncommon Arabic fonts not represented in the training data
Diacritics Complexity: While the model supports diacritics (tashkeel), extremely dense or unconventional diacritical mark combinations may reduce accuracy
Layout Sensitivity: May have difficulty with complex multi-column layouts or unconventional page formats
Handwriting Recognition: Limited capability with handwritten text as training focused on digital fonts
Decorative Text: May struggle with highly stylized or decorative Arabic calligraphy
Background Complexity: Optimized for clear backgrounds; performance may degrade with complex or textured backgrounds
Text Degradation: May have challenges with severely degraded, blurry, or low-resolution text
Non-standard Orientations: Primarily designed for horizontally oriented text; may struggle with vertical or diagonal text

Evaluation Method

Evaluation was performed on a diverse dataset of Arabic text images, primarily featuring diacritical marks (tashkeel), measuring:

Word Error Rate (WER): The percentage of incorrectly recognized words
Character Error Rate (CER): The percentage of incorrectly recognized characters
BLEU Score: A measure of translation quality, higher scores indicate better overall text recognition

How to Use

Try Qari v0.2.2.1 - Google Colab

You can load this model using the transformers and qwen_vl_utils library:

!pip install transformers qwen_vl_utils accelerate>=0.26.0 PEFT -U
!pip install -U bitsandbytes

from PIL import Image
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
import torch
import os
from qwen_vl_utils import process_vision_info



model_name = "NAMAA-Space/Qari-OCR-0.2.2.1-Arabic-2B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
                model_name,
                torch_dtype="auto",
                device_map="auto"
            )
processor = AutoProcessor.from_pretrained(model_name)
max_tokens = 2000

prompt = "Below is the image of one page of a document, as well as some raw textual content that was previously extracted for it. Just return the plain text representation of this document as if you were reading it naturally. Do not hallucinate."
image.save("image.png")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": f"file://{src}"},
            {"type": "text", "text": prompt},
        ],
    }
]
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=max_tokens)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
os.remove(src)
print(output_text)

License

This model follows the licensing terms of the original Qwen2 VL model. Please review the terms before using it commercially.

Citation

If you use this model in your research, please cite:

@misc{QariOCR2025,
  title={Qari-OCR v0.2.2.1: A High-Accuracy Model for Arabic Optical Character Recognition},
  author={NAMAA},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct}},
  note={Accessed: 2025-04-01}
}

NAMAA-Space
/

Qari-OCR-0.2.2.1-VL-2B-Instruct

Qari-OCR-Arabic-0.2.2.1-VL-2B-Instruct Model

Model Overview

Key Features

Model Details

Evaluation Metrics

Results

WER Comparison

CER Comparison

BLEU Score Comparison

Model Details

Training Data

Limitations

Evaluation Method

How to Use

License

Citation

Model tree for NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct

Space using NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct 1

Collection including NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct

Qari-OCR: A High-Accuracy Model for Arabic Optical Character