REVERSE-LLaVA-MORE-8B

Model Summary

REVERSE-LLaVA-MORE-8B is an open-source vision-language model (VLM) that performs both next-token prediction and self-verification / self-correction during generation. It is built upon LLaVA-MORE (LLaVA with LLaMA-3.1) and fine-tuned on the REVERSE Visual Instruct 1.3M dataset. The model is equipped with a retrospective resampling mechanism that detects and corrects hallucinations on the fly. Training was conducted in early March, 2025.

Performance

REVERSE-LLaVA-MORE-8B delivers strong performance gains in hallucination reduction across multiple captioning and open-ended VQA benchmarks:

Benchmark	Metric	Best Baseline	REVERSE (τ=0.003)	REVERSE (τ=0.0003)
CHAIR-MSCOCO	CHAIR (↓)	DoLA (13.8)	12.2	8.4
	CHAIRs (↓)	DoLA (51.8)	42.4	25.2
AMBER-G	Hallucination (↓)	Woodpecker (7.4)	6.5	5.1
	Coverage (↑)	DoLA (53.1)	54.8	38.9
MMHal-Bench	Score (↑)	DoLA (2.54)	2.28	2.93
	Hallucination Rate (↓)	DoLA (0.51)	0.54	0.40
HaloQuest	Avg. Accuracy (↑)	DoLA (22.8)	26.7	36.7
	False Premise Acc. (↑)	DoLA (15.5)	30.0	39.5
	Visual Challenging Acc. (↑)	DoLA (45.1)	31.3	30.9
	Insufficient Context Acc. (↑)	DoLA (7.4)	11.7	38.1

On discriminative tasks, REVERSE-LLaVA-MORE performs competitively with existing base VLM:

Benchmark	Metric	LLaVA-MORE-8B	REVERSE (τ=0.5)
AMBER-D	F1 Score (↑)	71.6	69.3
POPE	F1 Score (↑)	85.1	84.4
MME-Hall	Score (↑)	678.3	657.6

Usage

Please refer to the installation guide on GitHub to get started:
👉 Installation Guide

Additional Resources

📄 Project Page: https://reverse-vlm.github.io/
🧾 Dataset: REVERSE Visual Instruct 1.3M
🔧 Ask Questions: GitHub Issues

Intended Use

Primary Use Cases:

Reducing hallucination in image captioning and open-ended VQA
Evaluating hallucination-aware generation strategies
Research on grounded and trustworthy multimodal reasoning

Target Users:
Researchers, developers, and students working on VLMs, hallucination mitigation, and vision-language alignment.

tsunghanwu
/

reverse_llava_more

REVERSE-LLaVA-MORE-8B

Model Summary

Performance

Usage

Additional Resources

Intended Use

Model tree for tsunghanwu/reverse_llava_more

Dataset used to train tsunghanwu/reverse_llava_more

Collection including tsunghanwu/reverse_llava_more

REVERSE