Safetensors
llava_llama

REVERSE-LLaVA-MORE-8B

arXiv

Model Summary

REVERSE-LLaVA-MORE-8B is an open-source vision-language model (VLM) that performs both next-token prediction and self-verification / self-correction during generation. It is built upon LLaVA-MORE (LLaVA with LLaMA-3.1) and fine-tuned on the REVERSE Visual Instruct 1.3M dataset. The model is equipped with a retrospective resampling mechanism that detects and corrects hallucinations on the fly. Training was conducted in early March, 2025.

Performance

REVERSE-LLaVA-MORE-8B delivers strong performance gains in hallucination reduction across multiple captioning and open-ended VQA benchmarks:

Benchmark Metric Best Baseline REVERSE (Ο„=0.003) REVERSE (Ο„=0.0003)
CHAIR-MSCOCO CHAIR (↓) DoLA (13.8) 12.2 8.4
CHAIRs (↓) DoLA (51.8) 42.4 25.2
AMBER-G Hallucination (↓) Woodpecker (7.4) 6.5 5.1
Coverage (↑) DoLA (53.1) 54.8 38.9
MMHal-Bench Score (↑) DoLA (2.54) 2.28 2.93
Hallucination Rate (↓) DoLA (0.51) 0.54 0.40
HaloQuest Avg. Accuracy (↑) DoLA (22.8) 26.7 36.7
False Premise Acc. (↑) DoLA (15.5) 30.0 39.5
Visual Challenging Acc. (↑) DoLA (45.1) 31.3 30.9
Insufficient Context Acc. (↑) DoLA (7.4) 11.7 38.1

On discriminative tasks, REVERSE-LLaVA-MORE performs competitively with existing base VLM:

Benchmark Metric LLaVA-MORE-8B REVERSE (Ο„=0.5)
AMBER-D F1 Score (↑) 71.6 69.3
POPE F1 Score (↑) 85.1 84.4
MME-Hall Score (↑) 678.3 657.6

Usage

Please refer to the installation guide on GitHub to get started:
πŸ‘‰ Installation Guide

Additional Resources

Intended Use

Primary Use Cases:

  • Reducing hallucination in image captioning and open-ended VQA
  • Evaluating hallucination-aware generation strategies
  • Research on grounded and trustworthy multimodal reasoning

Target Users:
Researchers, developers, and students working on VLMs, hallucination mitigation, and vision-language alignment.

Downloads last month
8
Safetensors
Model size
8.05B params
Tensor type
FP16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tsunghanwu/reverse_llava_more

Finetuned
(1222)
this model

Dataset used to train tsunghanwu/reverse_llava_more

Collection including tsunghanwu/reverse_llava_more