weifeng-chen
commited on
Commit
•
ebf1867
1
Parent(s):
d4a825f
add zero and achieve better result
Browse files- README.md +5 -6
- pytorch_model.bin +1 -1
README.md
CHANGED
@@ -15,8 +15,7 @@ tags:
|
|
15 |
|
16 |
# Model Details
|
17 |
|
18 |
-
This model is a Chinese CLIP model trained on [Noah-Wukong Dataset](https://wukong-dataset.github.io/wukong-dataset/)
|
19 |
-
|
20 |
# Taiyi (太乙)
|
21 |
Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
|
22 |
|
@@ -65,15 +64,15 @@ with torch.no_grad():
|
|
65 |
|
66 |
| model | dataset | Top1 | Top5 |
|
67 |
| ---- | ---- | ---- | ---- |
|
68 |
-
| Taiyi-CLIP-Roberta-326M-Chinese | ImageNet1k-CN |
|
69 |
|
70 |
### Zero-Shot Text-to-Image Retrieval
|
71 |
|
72 |
| model | dataset | Top1 | Top5 | Top10 |
|
73 |
| ---- | ---- | ---- | ---- | ---- |
|
74 |
-
| Taiyi-CLIP-Roberta-326M-Chinese | Flickr30k-CNA-test |
|
75 |
-
| Taiyi-CLIP-Roberta-326M-Chinese | COCO-CN-test |
|
76 |
-
| Taiyi-CLIP-Roberta-326M-Chinese | wukong50k |
|
77 |
|
78 |
|
79 |
# Citation
|
|
|
15 |
|
16 |
# Model Details
|
17 |
|
18 |
+
This model is a Chinese CLIP model trained on [Noah-Wukong Dataset(100M)](https://wukong-dataset.github.io/wukong-dataset/) and [Zero(23M)](https://zero.so.com/). We use ViT-L-14 from [openAI](https://github.com/openai/CLIP) as image encoder and Chinese pre-trained language model [chinese-roberta-wwm-large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large) as text encoder. We freeze the image encoder and only finetune the text encoder. The model was first trained 10 epochs on wukong and then train another 12 epochs on wukong and zero.
|
|
|
19 |
# Taiyi (太乙)
|
20 |
Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
|
21 |
|
|
|
64 |
|
65 |
| model | dataset | Top1 | Top5 |
|
66 |
| ---- | ---- | ---- | ---- |
|
67 |
+
| Taiyi-CLIP-Roberta-326M-Chinese | ImageNet1k-CN | 53.05% | 79.55% |
|
68 |
|
69 |
### Zero-Shot Text-to-Image Retrieval
|
70 |
|
71 |
| model | dataset | Top1 | Top5 | Top10 |
|
72 |
| ---- | ---- | ---- | ---- | ---- |
|
73 |
+
| Taiyi-CLIP-Roberta-326M-Chinese | Flickr30k-CNA-test | 54.36% | 80.56% | 87.90% |
|
74 |
+
| Taiyi-CLIP-Roberta-326M-Chinese | COCO-CN-test | 51.47% | 81.00% | 90.40% |
|
75 |
+
| Taiyi-CLIP-Roberta-326M-Chinese | wukong50k | 61.18% | 90.46% | 95.74% |
|
76 |
|
77 |
|
78 |
# Citation
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1305368941
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:13f50fdd2fa0e809a95d602b4d74552d2c27e3ebc08f40108a7d1cae20a7107b
|
3 |
size 1305368941
|