jadechoghari
/

Ferret-UI-Gemma2b

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

Ferret-UI-Gemma2b / README.md

jadechoghari's picture

Update README.md

1426edd verified 2 months ago

|

2.44 kB

	---
	library_name: transformers
	pipeline_tag: image-text-to-text
	---

	Ferret-UI is the first UI-centric multimodal large language model (MLLM) designed for referring, grounding, and reasoning tasks.
	Built on Gemma-2B and Llama-3-8B, it is capable of executing complex UI tasks.
	This is the Gemma-2B version of ferret-ui. It follows from [this paper](https://arxiv.org/pdf/2404.05719) by Apple.


	## How to Use 🤗📱

	You will need first to download `builder.py`, `conversation.py`, `inference.py`, `model_UI.py`, and `mm_utils.py` locally.

	```bash
	wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/conversation.py
	wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/builder.py
	wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/inference.py
	wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/model_UI.py
	wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/mm_utils.py
	```

	### Usage:
	```python
	from inference import inference_and_run
	image_path = "appstore_reminders.png"
	prompt = "Describe the image in details"

	# Call the function without a box
	inference_text = inference_and_run(image_path, prompt, conv_mode="ferret_gemma_instruct", model_path="jadechoghari/Ferret-UI-Gemma2b")

	# Output processed text
	print("Inference Text:", inference_text)
	```

	```python
	# Task with bounding boxes
	image_path = "appstore_reminders.png"
	prompt = "What's inside the selected region?"
	box = [189, 906, 404, 970]

	inference_text = inference_and_run(
	image_path=image_path,
	prompt=prompt,
	conv_mode="ferret_gemma_instruct",
	model_path="jadechoghari/Ferret-UI-Gemma2b",
	box=box
	)
	# you could also pass process_image=True
	# to output: processed_image, inference_text = inference_and_run(...., process_image=True)

	print("Inference Text:", inference_text)
	```

	```python
	# GROUNDING PROMPTS
	GROUNDING_TEMPLATES = [
	'\nProvide the bounding boxes of the mentioned objects.',
	'\nInclude the coordinates for each mentioned object.',
	'\nLocate the objects with their coordinates.',
	'\nAnswer in [x1, y1, x2, y2] format.',
	'\nMention the objects and their locations using the format [x1, y1, x2, y2].',
	'\nDraw boxes around the mentioned objects.',
	'\nUse boxes to show where each thing is.',
	'\nTell me where the objects are with coordinates.',
	'\nList where each object is with boxes.',
	'\nShow me the regions with boxes.'
	]
	```