PR-LLaVA-34b
Overview
PR-LLM is a fine-tuned version of the LLaVA 1.6 34B vision-language model, specifically designed for the interpretation of panoramic dental radiographs. The model combines vision and natural language capabilities to generate descriptive insights, answer questions, and interpret radiographic findings in a clinically relevant manner.
Technical Details
- Base Model: LLaVA 1.6 34B
- Training Setup:
- Batch Size: 4
- Learning Rate: 2e-5
- Epochs: 3
- GPUs Used: 3 x NVIDIA A100
Dataset
A custom dataset of 121 panoramic radiographs, each paired with a detailed clinical report created by a maxillofacial radiologist was used. The dataset includes diverse findings such as missing teeth, caries, periapical disease, and alveolar bone levels.
Link to the dataset: [Dataset]
Evaluation
PR-LLM was benchmarked against open-source models (LLaVA-Med, LLaVA variants) and closed-source models (GPT-4o, Gemini 1.5 Pro). It demonstrated superior accuracy in FDI tooth numbering and generated clear, detailed answers, making it suitable for clinical use.
- Downloads last month
- 0