MLM-Filter-13b Model Card

Model details

Model type: MLM-Filter-13b is an open-source MLLM trained to assess the data quality of image-text paired data. It can generate 4 quality metrics for image-text data: Image Text Matching, Object Detail Fulfillment, Caption Text Quality, and Semantic Understanding.

Model date: MLM-Filter-13B was trained in Dec 2023.

Paper or resources for more information: https://mlm-filter.github.io/

@article{wang2024finetuned,
  title={Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters},
  author={Wang, Weizhi and Mrini, Khalil and Yang, Linjie and Kumar, Sateesh and Tian, Yu and Yan, Xifeng and Wang, Heng},
  journal={arXiv preprint arXiv:2403.02677},
  year={2024}
}

License

Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

Where to send questions or comments about the model: https://github.com/Victorwz/MLM_Filter/issues

Intended use

Primary intended uses: MLM-Filter can be used as a drop-in replacement for CLIPScore in these tasks:

  1. Score image-text data in large-scale pre-training dataset and then filter high-quality subsets based on the scores (For training MLLMs or VLMs, please consider to jointly use the Image-Text Matching score and the Object Detail Fulfillment score);

  2. Evaluate the image-text alignment for image2text or text2image generation models;

  3. Any potential applications with the need to calculate the image-text alignment.

Training dataset

  • 46k instruction sampled from LLaVA-1.5 665k data.
  • 4k instructions on image-text data quality assessment tasks ranging across 4 metrics.

Usage Sample

Please follow the instructions in https://github.com/Victorwz/MLM_Filter.

Downloads last month
22
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including weizhiwang/mlm-filter-llava-13b-gpt4v