File size: 1,592 Bytes
8567f95 c0efded d585022 6d3a879 8567f95 bd54efd 7bbcbcd 8567f95 e4e42e3 8567f95 e090ac3 8567f95 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
---
license: mit
pipeline_tag: text-classification
inference: false
---
# Official ICC model [ACL 2024 Findings]
The official checkpoint of ICC model, introduced in [ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation](https://arxiv.org/abs/2403.01306)
[Project Page](https://moranyanuka.github.io/icc/)
## Usage
The ICC model is used to quantify the concreteness of image captions, and the intended use is finding the best captions in a noisy multimodal dataset. It can be achieved by simply running it over the captions and filtering out samples with low score.
It works best in conjunction with CLIP based filtering.
### Running the model
<details>
<summary> Click to expand </summary>
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("moranyanuka/icc")
model = AutoModelForSequenceClassification.from_pretrained("moranyanuka/icc").to("cuda")
captions = ["a great method of quantifying concreteness", "a man with a white shirt"]
text_ids = tokenizer(captions, padding=True, return_tensors="pt", truncation=True).to('cuda')
with torch.inference_mode():
icc_scores = model(**text_ids)['logits']
# tensor([[0.0339], [1.0068]])
```
</details>
bibtex:
```
@misc{yanuka2024icc,
title={ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation},
author={Moran Yanuka and Morris Alper and Hadar Averbuch-Elor and Raja Giryes},
year={2024},
eprint={2403.01306},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
``` |