REVERSE-LLaVA-MORE-8B
Model Summary
REVERSE-LLaVA-MORE-8B is an open-source vision-language model (VLM) that performs both next-token prediction and self-verification / self-correction during generation. It is built upon LLaVA-MORE (LLaVA with LLaMA-3.1) and fine-tuned on the REVERSE Visual Instruct 1.3M dataset. The model is equipped with a retrospective resampling mechanism that detects and corrects hallucinations on the fly. Training was conducted in early March, 2025.
Performance
REVERSE-LLaVA-MORE-8B delivers strong performance gains in hallucination reduction across multiple captioning and open-ended VQA benchmarks:
Benchmark | Metric | Best Baseline | REVERSE (Ο=0.003) | REVERSE (Ο=0.0003) |
---|---|---|---|---|
CHAIR-MSCOCO | CHAIR (β) | DoLA (13.8) | 12.2 | 8.4 |
CHAIRs (β) | DoLA (51.8) | 42.4 | 25.2 | |
AMBER-G | Hallucination (β) | Woodpecker (7.4) | 6.5 | 5.1 |
Coverage (β) | DoLA (53.1) | 54.8 | 38.9 | |
MMHal-Bench | Score (β) | DoLA (2.54) | 2.28 | 2.93 |
Hallucination Rate (β) | DoLA (0.51) | 0.54 | 0.40 | |
HaloQuest | Avg. Accuracy (β) | DoLA (22.8) | 26.7 | 36.7 |
False Premise Acc. (β) | DoLA (15.5) | 30.0 | 39.5 | |
Visual Challenging Acc. (β) | DoLA (45.1) | 31.3 | 30.9 | |
Insufficient Context Acc. (β) | DoLA (7.4) | 11.7 | 38.1 |
On discriminative tasks, REVERSE-LLaVA-MORE performs competitively with existing base VLM:
Benchmark | Metric | LLaVA-MORE-8B | REVERSE (Ο=0.5) |
---|---|---|---|
AMBER-D | F1 Score (β) | 71.6 | 69.3 |
POPE | F1 Score (β) | 85.1 | 84.4 |
MME-Hall | Score (β) | 678.3 | 657.6 |
Usage
Please refer to the installation guide on GitHub to get started:
π Installation Guide
Additional Resources
- π Project Page: https://reverse-vlm.github.io/
- π§Ύ Dataset: REVERSE Visual Instruct 1.3M
- π§ Ask Questions: GitHub Issues
Intended Use
Primary Use Cases:
- Reducing hallucination in image captioning and open-ended VQA
- Evaluating hallucination-aware generation strategies
- Research on grounded and trustworthy multimodal reasoning
Target Users:
Researchers, developers, and students working on VLMs, hallucination mitigation, and vision-language alignment.
- Downloads last month
- 8
Model tree for tsunghanwu/reverse_llava_more
Base model
meta-llama/Llama-3.1-8B