--- license: cc-by-4.0 --- # Hatemoji Model ## Model description This model is a fine-tuned version of the [DeBERTa base model](https://huggingface.co/microsoft/deberta-base). This model is cased. The model was trained on iterative rounds of adversarial data generation with human-and-model-in-the-loop. Each round of data has emoji-containing statements which are either non-hateful (LABEL 0.0) or hateful (LABEL 1.0). - **Data Repository:** https://github.com/HannahKirk/Hatemoji - **Paper:** https://arxiv.org/abs/2108.05921 - **Point of Contact:** hannah.kirk@oii.ox.ac.uk ## Intended uses & limitations The intended use of the model is to classify English-language, emoji-containing, short-form text documents as a binary task: non-hateful vs hateful. The model has demonstrated strengths compared to commercial and academic models on classifying emoji-based hate, but is also a strong classifier of text-only hate. Because the model was trained on synthetic, adversarially-generated data, it may have some weaknesses when it comes to empirical emoji-based hate 'in-the-wild'. ## How to use ## Training data The model was trained on [HatemojiBuild](https://huggingface.co/datasets/HannahRoseKirk/HatemojiBuild), alongside the four rounds of text-only adversarial data from Vidgen, B., Thrush, T., Waseem, Z., & Kiela, D. (2020). Learning from the worst: Dynamically generated datasets to improve online hate detection. arXiv