--- license: mit pipeline_tag: text-classification inference: false --- # Official ICC model [ACL 2024 Findings] The official checkpoint of ICC model, introduced in [ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation](https://arxiv.org/abs/2403.01306) [Project Page](https://moranyanuka.github.io/icc/) ## Usage The ICC model is used to quantify the concreteness of image captions, and the intended use is finding the best captions in a noisy multimodal dataset. It can be achieved by simply running it over the captions and filtering out samples with low score. It works best in conjunction with CLIP based filtering. ### Running the model
Click to expand ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("moranyanuka/icc") model = AutoModelForSequenceClassification.from_pretrained("moranyanuka/icc").to("cuda") captions = ["a great method of quantifying concreteness", "a man with a white shirt"] text_ids = tokenizer(captions, padding=True, return_tensors="pt", truncation=True).to('cuda') with torch.inference_mode(): icc_scores = model(**text_ids)['logits'] # tensor([[0.0339], [1.0068]]) ```
bibtex: ``` @misc{yanuka2024icc, title={ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation}, author={Moran Yanuka and Morris Alper and Hadar Averbuch-Elor and Raja Giryes}, year={2024}, eprint={2403.01306}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```