PR-LLaVA-34b

Overview

PR-LLM is a fine-tuned version of the LLaVA 1.6 34B vision-language model, specifically designed for the interpretation of panoramic dental radiographs. The model combines vision and natural language capabilities to generate descriptive insights, answer questions, and interpret radiographic findings in a clinically relevant manner.

Technical Details

Base Model: LLaVA 1.6 34B
Training Setup:
- Batch Size: 4
- Learning Rate: 2e-5
Epochs: 3
GPUs Used: 3 x NVIDIA A100

Dataset

A custom dataset of 121 panoramic radiographs, each paired with a detailed clinical report created by a maxillofacial radiologist was used. The dataset includes diverse findings such as missing teeth, caries, periapical disease, and alveolar bone levels.

Link to the dataset: [Dataset]

Evaluation

PR-LLM was benchmarked against open-source models (LLaVA-Med, LLaVA variants) and closed-source models (GPT-4o, Gemini 1.5 Pro). It demonstrated superior accuracy in FDI tooth numbering and generated clear, detailed answers, making it suitable for clinical use.