xiangan commited on
Commit
e6656a0
1 Parent(s): 14fd13e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -9,10 +9,16 @@ tags:
9
  - LLaVA
10
  ---
11
 
 
 
 
12
  [[Paper]](https://arxiv.org/abs/2407.17331) [[GitHub]](https://github.com/deepglint/unicom)
13
  ## Model
14
  We used the same Vision Transformer architecture [ViT-L/14@336px as CLIP](https://huggingface.co/openai/clip-vit-large-patch14-336).
15
 
 
 
 
16
  ## Data
17
  Our model was trained on publicly available image-caption data from the [LAION400M](https://arxiv.org/abs/2111.02114) and [COYO700M](https://github.com/kakaobrain/coyo-dataset) datasets.
18
 
 
9
  - LLaVA
10
  ---
11
 
12
+
13
+
14
+
15
  [[Paper]](https://arxiv.org/abs/2407.17331) [[GitHub]](https://github.com/deepglint/unicom)
16
  ## Model
17
  We used the same Vision Transformer architecture [ViT-L/14@336px as CLIP](https://huggingface.co/openai/clip-vit-large-patch14-336).
18
 
19
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6478679d7b370854241b2ad8/8n_jBobanaLNAQjM5eZeg.png)
20
+
21
+
22
  ## Data
23
  Our model was trained on publicly available image-caption data from the [LAION400M](https://arxiv.org/abs/2111.02114) and [COYO700M](https://github.com/kakaobrain/coyo-dataset) datasets.
24