HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model

[๐Ÿ“‚ GitHub] [๐Ÿ“œ Paper]

This is InternVL2_5-HiMTok-8B model fine-tuned on the refcoco series train dataset.

If you find this project useful in your research, please consider citing:

@article{wang2025himtok,
  title={HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model},
  author={Wang, Tao and Cheng, Changxu and Wang, Lingfeng and Chen, Senda and Zhao, Wuyue},
  journal={arXiv preprint arXiv:2503.13026},
  year={2025}
}
Downloads last month
2
Safetensors
Model size
8.73B params
Tensor type
BF16
ยท
BOOL
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for yayafengzi/InternVL2_5-HiMTok-8B

Finetuned
(9)
this model

Collection including yayafengzi/InternVL2_5-HiMTok-8B