imirandam
/

CLIP_COCO

Model card Files Files and versions Community

CLIP_COCO / README.md

imirandam's picture

Update README.md

e188b0b verified 8 months ago

|

history blame contribute delete

1.92 kB

	---
	license: mit
	---

	# Model Card for CLIP_COCO
	## Model Description
	- Homepage: https://imirandam.github.io/BiVLC_project_page/
	- Repository: https://github.com/IMirandaM/BiVLC
	- Paper: https://arxiv.org/abs/2406.09952
	- Point of Contact: [Imanol Miranda](mailto:[email protected])
	### Model Summary
	CLIP_COCO is a model presented in the [BiVLC](https://github.com/IMirandaM/BiVLC) paper for experimentation. It has been fine-tuned with OpenCLIP framework using as basis the CLIP ViT-B-32 model pre-trained by 'openai'. The idea behind this fine-tuning is to have a baseline to compare the [CLIP_TROHN-Text](https://huggingface.co/imirandam/CLIP_TROHN-Text) and [CLIP_TROHN-Img](https://huggingface.co/imirandam/CLIP_TROHN-Img) models. Hyperparameters:

	* Learning rate: 1e-6.
	* Scheduler: Cosine scheduler with 50 warmup steps.
	* Optimizer: AdamW optimizer with beta1 = 0.9, beta2 = 0.98, eps = 1e-6 and weight decay = 0.1.
	* Loss function: InfoNCE Loss.
	* Batch size: We define a batch size of 400, resulting in 400 images x 400 captions.
	* Epochs: We fine-tune all models over 10 epochs and we used validation accuracy as the model selection criterion, i.e. we selected the model with the highest accuracy on the corresponding validation set.
	* Data: It is fine-tuned with COCO 2017 train split.

	### Evaluation Data
	The model is evaluated in [BiVLC](https://huggingface.co/datasets/imirandam/BiVLC).

	### Licensing Information
	This work is licensed under a MIT License.

	## Citation Information
	If you find this dataset useful, please consider citing our paper:
	```
	@misc{miranda2024bivlc,
	title={BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval},
	author={Imanol Miranda and Ander Salaberria and Eneko Agirre and Gorka Azkune},
	year={2024},
	eprint={2406.09952},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```