jadechoghari commited on
Commit
5c28fa4
•
1 Parent(s): cf10eaf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -26
README.md CHANGED
@@ -11,42 +11,60 @@ This is the **Llama-3-8B** version of ferret-ui. It follows from [this paper](ht
11
 
12
  ## How to Use 🤗📱
13
 
14
- You will need first to download `builder.py`, `conversation.py`, and `inference.py` locally.
 
15
 
16
  ```bash
17
- wget https://huggingface.co/jadechoghari/Ferret-UI-Llama8b/raw/main/conversation.py
18
- wget https://huggingface.co/jadechoghari/Ferret-UI-Llama8b/raw/main/builder.py
19
- wget https://huggingface.co/jadechoghari/Ferret-UI-Llama8b/raw/main/inference.py
 
 
20
  ```
21
 
22
  ### Usage:
23
  ```python
24
- from inference import infer_ui_task
25
- # Pass an image and the online model path
26
- image_path = 'image.jpg'
27
- model_path = 'jadechoghari/Ferret-UI-Llama8b'
28
- ```
29
 
30
- ### Task not requiring bounding box
31
- Choose a task from ['widget_listing', 'find_text', 'find_icons', 'find_widget', 'conversation_interaction']
32
- ```python
33
- task = 'conversation_interaction'
34
- result = infer_ui_task(image_path, "How do I navigate to the Games tab?", model_path, task)
35
- print("Result:", result)
36
  ```
37
- ### Task requiring bounding box
38
- Choose a task from ['widgetcaptions', 'taperception', 'ocr', 'icon_recognition', 'widget_classification', 'example_0']
39
  ```python
40
- task = 'widgetcaptions'
41
- region = (50, 50, 200, 200)
42
- result = infer_ui_task(image_path, "Describe the contents of the box.", model_path, task, region=region)
43
- print("Result:", result)
 
 
 
 
 
 
 
 
 
 
 
44
  ```
45
 
46
- ### Task with no image processing
47
- Choose a task from ['screen2words', 'detailed_description', 'conversation_perception', 'gpt4']
48
  ```python
49
- task = 'detailed_description'
50
- result = infer_ui_task(image_path, "Please describe the screen in detail.", model_path, task)
51
- print("Result:", result)
 
 
 
 
 
 
 
 
 
 
52
  ```
 
11
 
12
  ## How to Use 🤗📱
13
 
14
+
15
+ You will need first to download `builder.py`, `conversation.py`, `inference.py`, `model_UI.py`, and `mm_utils.py` locally.
16
 
17
  ```bash
18
+ wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/conversation.py
19
+ wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/builder.py
20
+ wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/inference.py
21
+ wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/model_UI.py
22
+ wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/mm_utils.py
23
  ```
24
 
25
  ### Usage:
26
  ```python
27
+ from inference import inference_and_run
28
+ image_path = "appstore_reminders.png"
29
+ prompt = "Describe the image in details"
 
 
30
 
31
+ # Call the function without a box
32
+ processed_image, inference_text = inference_and_run(image_path, prompt, conv_mode="ferret_gemma_instruct", model_path="jadechoghari/Ferret-UI-Gemma2b")
33
+
34
+ # Output processed text
35
+ print("Inference Text:", inference_text)
 
36
  ```
37
+
 
38
  ```python
39
+ # Task with bounding boxes
40
+ image_path = "appstore_reminders.png"
41
+ prompt = "What's inside the selected region?"
42
+ box = [189, 906, 404, 970]
43
+
44
+ processed_image, inference_text = inference_and_run(
45
+ image_path=image_path,
46
+ prompt=prompt,
47
+ conv_mode="ferret_gemma_instruct",
48
+ model_path="jadechoghari/Ferret-UI-Gemma2b",
49
+ box=box
50
+ )
51
+
52
+ # otput the inference text and optionally save the processed image
53
+ print("Inference Text:", inference_text)
54
  ```
55
 
 
 
56
  ```python
57
+ # GROUNDING PROMPTS
58
+ GROUNDING_TEMPLATES = [
59
+ '\nProvide the bounding boxes of the mentioned objects.',
60
+ '\nInclude the coordinates for each mentioned object.',
61
+ '\nLocate the objects with their coordinates.',
62
+ '\nAnswer in [x1, y1, x2, y2] format.',
63
+ '\nMention the objects and their locations using the format [x1, y1, x2, y2].',
64
+ '\nDraw boxes around the mentioned objects.',
65
+ '\nUse boxes to show where each thing is.',
66
+ '\nTell me where the objects are with coordinates.',
67
+ '\nList where each object is with boxes.',
68
+ '\nShow me the regions with boxes.'
69
+ ]
70
  ```