metadata
datasets:
- FoundationVision/groma_instruct
language:
- en
pipeline_tag: image-text-to-text
library_name: transformers
This repository contains the model of the paper Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models.