FoundationVision
/

groma-7b-finetune

+---
+datasets:
+- FoundationVision/groma_instruct
+language:
+- en
+pipeline_tag: image-text-to-text
+library_name: transformers
+---
+This repository contains the model of the paper [Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models](https://huggingface.co/papers/2404.13013).