Image-Text-to-Text
Safetensors
llava_llama
BoyuNLP commited on
Commit
a4ae4e7
·
verified ·
1 Parent(s): c61a302

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -14,9 +14,17 @@ UGround is a storng GUI visual grounding model trained with a simple recipe. Che
14
  - **Point of Contact:** [Boyu Gou](mailto:[email protected])
15
 
16
  - [x] Model Weights
17
- - [x] Qwen2-VL-based V1
 
 
 
 
 
 
 
 
18
  - [ ] Code
19
- - [ ] Inference Code of UGround
20
  - [x] Offline Experiments
21
  - [x] Screenspot (along with referring expressions generated by GPT-4/4o)
22
  - [x] Multimodal-Mind2Web
 
14
  - **Point of Contact:** [Boyu Gou](mailto:[email protected])
15
 
16
  - [x] Model Weights
17
+ - [x] Initial V1 (the one used in the paper)
18
+ - [x] Qwen2-VL-based V1
19
+ - [x] 2B
20
+ - [x] 7B
21
+ - [ ] 72B
22
+ - [ ] V1.1
23
+ - [ ] 2B
24
+ - [ ] 7B
25
+ - [ ] 72B
26
  - [ ] Code
27
+ - [x] Inference Code of UGround
28
  - [x] Offline Experiments
29
  - [x] Screenspot (along with referring expressions generated by GPT-4/4o)
30
  - [x] Multimodal-Mind2Web