Image-Text-to-Text
Safetensors
llava_llama
BoyuNLP commited on
Commit
91dd55e
·
verified ·
1 Parent(s): d67a49b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -2
README.md CHANGED
@@ -54,9 +54,9 @@ UGround is a storng GUI visual grounding model trained with a simple recipe. Che
54
 
55
  ## Main Results
56
 
57
- ### ScreenSpot (GUI Visual Grounding)
58
 
59
- | ScreenSpot (Standard) | Arch | SFT data | Mobile-Text | Mobile-Icon | Desktop-Text | Desktop-Icon | Web-Text | Web-Icon | Avg |
60
  | ---------------------------- | ---------------- | ---------------- | ----------- | ----------- | ------------ | ------------ | -------- | -------- | -------- |
61
  | GPT-4 | | | 22.6 | 24.5 | 20.2 | 11.8 | 9.2 | 8.8 | 16.2 |
62
  | GPT-4o | | | 20.2 | 24.9 | 21.1 | 23.6 | 12.2 | 7.8 | 18.3 |
@@ -79,6 +79,18 @@ UGround is a storng GUI visual grounding model trained with a simple recipe. Che
79
  | **UGround-V1-2B (Qwen2-VL)** | Qwen2-VL | UGround-V1 | 89.4 | 72.0 | 88.7 | 65.7 | 81.3 | 68.9 | 77.7 |
80
  | **UGround-V1-7B (Qwen2-VL)** | Qwen2-VL | UGround-V1 | 93.0 | **79.9** | **93.8** | **76.4** | **90.9** | **84.0** | **86.3** |
81
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
 
84
 
 
54
 
55
  ## Main Results
56
 
57
+ ### GUI Visual Grounding: ScreenSpot (Standard Setting)
58
 
59
+ | Model | Arch | SFT data | Mobile-Text | Mobile-Icon | Desktop-Text | Desktop-Icon | Web-Text | Web-Icon | Avg |
60
  | ---------------------------- | ---------------- | ---------------- | ----------- | ----------- | ------------ | ------------ | -------- | -------- | -------- |
61
  | GPT-4 | | | 22.6 | 24.5 | 20.2 | 11.8 | 9.2 | 8.8 | 16.2 |
62
  | GPT-4o | | | 20.2 | 24.9 | 21.1 | 23.6 | 12.2 | 7.8 | 18.3 |
 
79
  | **UGround-V1-2B (Qwen2-VL)** | Qwen2-VL | UGround-V1 | 89.4 | 72.0 | 88.7 | 65.7 | 81.3 | 68.9 | 77.7 |
80
  | **UGround-V1-7B (Qwen2-VL)** | Qwen2-VL | UGround-V1 | 93.0 | **79.9** | **93.8** | **76.4** | **90.9** | **84.0** | **86.3** |
81
 
82
+ ### GUI Visual Grounding: ScreenSpot (Agent Setting)
83
+
84
+ | Planner | Grounding Model | Arch | SFT data | Mobile-Text | Mobile-Icon | Desktop-Text | Desktop-Icon | Web-Text | Web-Icon | Avg |
85
+ | ------- | ------------------------ | ---------------- | ---------------- | ----------- | ----------- | ------------ | ------------ | -------- | -------- | -------- |
86
+ | GPT-4o | Qwen-VL | Qwen-VL | | 21.3 | 21.4 | 18.6 | 10.7 | 9.1 | 5.8 | 14.5 |
87
+ | GPT-4o | SeeClick | Qwen-VL | SeeClick | 81.0 | 59.8 | 69.6 | 33.6 | 43.9 | 26.2 | 52.4 |
88
+ | GPT-4o | Qwen-GUI | Qwen-VL | GUICourse | 67.8 | 24.5 | 53.1 | 16.4 | 50.4 | 18.5 | 38.5 |
89
+ | GPT-4o | UGround-V1 | LLaVA-UGround-V1 | UGround-V1 | 93.4 | 76.9 | 92.8 | 67.9 | 88.7 | 68.9 | 81.4 |
90
+ | GPT-4o | OS-Atlas-Base-4B | InternVL | OS-Atlas | **94.1** | 73.8 | 77.8 | 47.1 | 86.5 | 65.3 | 74.1 |
91
+ | GPT-4o | OS-Atlas-Base-7B | Qwen2-VL | OS-Atlas | 93.8 | **79.9** | 90.2 | 66.4 | **92.6** | **79.1** | 83.7 |
92
+ | GPT-4o | UGround-V1-2B (Qwen2-VL) | Qwen2-VL | UGround-V1 | **94.1** | 77.7 | 92.8 | 63.6 | 90.0 | 70.9 | 81.5 |
93
+ | GPT-4o | UGround-V1-7B (Qwen2-VL) | Qwen2-VL | UGround-V1 | **94.1** | **79.9** | **93.3** | **73.6** | 89.6 | 73.3 | **84.0** |
94
 
95
 
96