PR-LLaVA-34b

Overview

PR-LLM is a fine-tuned version of the LLaVA 1.6 34B vision-language model, specifically designed for the interpretation of panoramic dental radiographs. The model combines vision and natural language capabilities to generate descriptive insights, answer questions, and interpret radiographic findings in a clinically relevant manner.

Technical Details

  • Base Model: LLaVA 1.6 34B
  • Training Setup:
    • Batch Size: 4
    • Learning Rate: 2e-5
  • Epochs: 3
  • GPUs Used: 3 x NVIDIA A100

Dataset

A custom dataset of 121 panoramic radiographs, each paired with a detailed clinical report created by a maxillofacial radiologist was used. The dataset includes diverse findings such as missing teeth, caries, periapical disease, and alveolar bone levels.

Link to the dataset: [Dataset]

Evaluation

PR-LLM was benchmarked against open-source models (LLaVA-Med, LLaVA variants) and closed-source models (GPT-4o, Gemini 1.5 Pro). It demonstrated superior accuracy in FDI tooth numbering and generated clear, detailed answers, making it suitable for clinical use.

Downloads last month
0
Safetensors
Model size
34.8B params
Tensor type
FP16
·
Inference API
Unable to determine this model's library. Check the docs .