Text Classification
Transformers
PyTorch
English
deberta
hate-speech-detection
Inference Endpoints
HannahRoseKirk commited on
Commit
3c511aa
1 Parent(s): 7a291d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -1,3 +1,20 @@
1
  ---
2
  license: cc-by-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
  ---
4
+
5
+ # Hatemoji Model
6
+
7
+ ## Model description
8
+
9
+ This model is a fine-tuned version of the [DeBERTa base model](https://huggingface.co/microsoft/deberta-base). This model is cased. The model was trained on iterative rounds of adversarial data generation with human-and-model-in-the-loop. Each round of data has emoji-containing statements which are either non-hateful (LABEL 0.0) or hateful (LABEL 1.0).
10
+ - **Data Repository:** https://github.com/HannahKirk/Hatemoji
11
+ - **Paper:** https://arxiv.org/abs/2108.05921
12
+ - **Point of Contact:** [email protected]
13
+
14
+ ## Intended uses & limitations
15
+ The intended use of the model is to classify English-language, emoji-containing, short-form text documents as a binary task: non-hateful vs hateful. The model has demonstrated strengths compared to commercial and academic models on classifying emoji-based hate, but is also a strong classifier of text-only hate. Because the model was trained on synthetic, adversarially-generated data, it may have some weaknesses when it comes to empirical emoji-based hate 'in-the-wild'.
16
+
17
+ ## How to use
18
+
19
+ ## Training data
20
+ The model was trained on [HatemojiBuild](https://huggingface.co/datasets/HannahRoseKirk/HatemojiBuild), alongside the four rounds of text-only adversarial data from Vidgen, B., Thrush, T., Waseem, Z., & Kiela, D. (2020). Learning from the worst: Dynamically generated datasets to improve online hate detection. arXiv