VL-Rethinker-7B

🚀 News: We release our meticulously curated collection of RL training queries for multimodal reasoning: ViRL39K.

VL-Rethinker-7B achieves SoTA results on various multimodal reasoning benchmarks.

It is trained using the GRPO-SSR and Forced Rethinking techniques, using meticulously curated ViRL39K.

For details of our approach and performance comparison, please see our paper.

For details of training and evaluation, please see our code repo.

Explore further via the following links:

| 🚀Project Page | 📖Paper | 🔗Github | 🤗Data (ViRL39K) |

Citation

If you feel this model useful, please give us a free cite:

@article{vl-rethinker,
      title={VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning},
      author = {Wang, Haozhe and Qu, Chao and Huang, Zuming and Chu, Wei and Lin,Fangzhen and Chen, Wenhu},
      journal={arXiv preprint arXiv:2504.08837},
      year={2025}
}
Downloads last month
438
Safetensors
Model size
8.29B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TIGER-Lab/VL-Rethinker-7B

Finetuned
(167)
this model
Quantizations
4 models

Collection including TIGER-Lab/VL-Rethinker-7B