Model Card for LLaVA-1.6-Mistral-7B-Offensive-Meme-Singapore

This model is described in the paper Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models. It classifies memes as offensive or not offensive, specifically within the Singaporean context.

Model Details

This model is a fine-tuned Vision-Language Model (VLM) designed to detect offensive memes in the Singaporean context. It leverages the strengths of VLMs to handle the nuanced and culturally specific nature of meme interpretation, addressing the limitations of traditional content moderation systems. The model was fine-tuned on a dataset of 112K memes labeled by GPT-4V. The fine-tuning process involved a pipeline incorporating OCR, translation, and a 7-billion parameter VLM (LLaVA-v1.6-Mistral-7b-hf). The resulting model demonstrates strong performance in offensive meme detection, achieving high accuracy and AUROC scores on a held-out test set.

Developed by: Cao Yuxuan, Wu Jiayang, Alistair Cheong Liang Chuen, Bryan Shan Guanrong, Theodore Lee Chong Jen, and Sherman Chann Zhi Shen
Model type: Fine-tuned Vision-Language Model (VLM)
Language(s) (NLP): English (with multilingual capabilities through the pipeline)
License: MIT
Finetuned from model: llava-hf/llava-v1.6-mistral-7b-hf
Repository: https://github.com/aliencaocao/vlm-for-memes-aisg
Paper: Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models

Uses

Direct Use

The model can be used directly for classifying memes as offensive or non-offensive. Input is expected to be a meme image. The model processes this using OCR and translation where necessary, then utilizes a VLM for classification.

Downstream Use

This model can be integrated into larger content moderation systems to enhance the detection of offensive memes, specifically targeting the Singaporean context.

Out-of-Scope Use

This model is specifically trained for the Singaporean context. Its performance may degrade significantly when applied to memes from other cultures or regions. It is also not suitable for general-purpose image classification tasks.

Bias, Risks, and Limitations

The model's performance is inherently tied to the quality and representativeness of the training data. Biases present in the training data may be reflected in the model's output, particularly regarding the interpretation of culturally specific humor or references. The model may misclassify memes due to ambiguities in language or visual representation. It is crucial to use this model responsibly and acknowledge its limitations.

Recommendations

Users should be aware of the potential biases and limitations of the model. Human review of the model's output is strongly recommended, especially in high-stakes scenarios. Further research into mitigating bias and enhancing robustness is needed.

How to Get Started with the Model

[More Information Needed]

Training Details

Training Data

[More Information Needed]

Training Procedure

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

[More Information Needed]

Model Examination

[More Information Needed]

Environmental Impact

[More Information Needed]

Technical Specifications

[More Information Needed]

Citation

@misc{yuxuan2025detectingoffensivememessocial,
      title={Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models},
      author={Cao Yuxuan and Wu Jiayang and Alistair Cheong Liang Chuen and Bryan Shan Guanrong and Theodore Lee Chong Jen and Sherman Chann Zhi Shen},
      year={2025},
      eprint={2502.18101},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.18101},
}

Glossary

[More Information Needed]

More Information

[More Information Needed]

Model Card Authors

[More Information Needed]

Model Card Contact

[More Information Needed]

aliencaocao
/

llava-1.6-mistral-7b-offensive-meme-singapore

Model Card for LLaVA-1.6-Mistral-7B-Offensive-Meme-Singapore

Model Details

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Model Examination

Environmental Impact

Technical Specifications

Citation

Glossary

More Information

Model Card Authors

Model Card Contact

Model tree for aliencaocao/llava-1.6-mistral-7b-offensive-meme-singapore

Dataset used to train aliencaocao/llava-1.6-mistral-7b-offensive-meme-singapore

Evaluation results