KAXY commited on
Commit
1644c1a
1 Parent(s): b3da60d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -3
README.md CHANGED
@@ -1,3 +1,113 @@
1
- ---
2
- license: gpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - counter speech
8
+ base_model: openai-community/gpt2-medium
9
+ ---
10
+
11
+ ---
12
+
13
+ # Target-Aware Counter-Speech Generation
14
+
15
+ <!-- Provide a quick summary of what the model is/does. -->
16
+
17
+ The target-aware counter-speech generation model is an autoregressive generative language model fine-tuned on hate- and counter-speech pairs from the [CONAN](https://github.com/marcoguerini/CONAN) datasets for generating more contextually relevant counter-speech, based on the [gpt2-medium](https://huggingface.co/gpt2-medium) model.
18
+ The model utilizes special tokens that embedded target demographic information to guide the generation towards more relevant responses, avoiding off-topic and generic responses. The model is trained on 8 target demographics, including Migrants, People of Color (POC), LGBT+, Muslims, Women, Jews, Disabled, and Other.
19
+
20
+ ## Uses
21
+
22
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
23
+ The model is intended for generating counter-speech responses for a given hate speech sequence, combined with special tokens for target-demographic embeddings.
24
+
25
+
26
+ ## Bias, Risks, and Limitations
27
+
28
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
29
+
30
+ We observed negative effects such as content hallucination and toxic response generation. Though the intended use is to generate counter-speech for combating online hatred, the usage is to be monitored carefully with human post-editing or approval system, ensuring safe and inclusive online environment.
31
+
32
+
33
+ ## How to Get Started with the Model
34
+
35
+ Use the code below to get started with the model.
36
+
37
+
38
+
39
+ types = ["MIGRANTS", "POC", "LGBT+", "MUSLIMS", "WOMEN", "JEWS", "other", "DISABLED"] # A list of all available target-demographic tokens
40
+ from transformers import AutoTokenizer, AutoModelForCausalLM
41
+
42
+ model = AutoModelForCausalLM.from_pretrained(tum-nlp/gpt-2-medium-target-aware-counterspeech-generation)
43
+ tokenizer = AutoTokenizer.from_pretrained(tum-nlp/gpt-2-medium-target-aware-counterspeech-generation)
44
+ tokenizer.padding_side = "left"
45
+
46
+ prompt = "<|endoftext|> <other> Hate-speech: Human are not created equal, some are born lesser. Counter-speech: "
47
+ input = tokenizer(prompt, return_tensors="pt", padding=True)
48
+ output_sequences = model.generate(
49
+ input_ids=inputs['input_ids'].to(model.device),
50
+ attention_mask=inputs['attention_mask'].to(model.device),
51
+ pad_token_id=tokenizer.eos_token_id,
52
+ max_length=128,
53
+ num_beams=3,
54
+ no_repeat_ngram_size=3,
55
+ num_return_sequences=1,
56
+ early_stopping=True
57
+ )
58
+ result = tokenizer.decode(output_sequences, skip_special_tokens=True)
59
+
60
+
61
+ #### Training Hyperparameters
62
+
63
+ training_args = TrainingArguments(
64
+ num_train_epochs=20,
65
+ learning_rate=3.800568576836524e-05,
66
+ weight_decay=0.050977894796868116,
67
+ warmup_ratio=0.10816909354342182,
68
+ optim="adamw_torch",
69
+ lr_scheduler_type="cosine",
70
+ evaluation_strategy="epoch",
71
+ save_strategy="epoch",
72
+ save_total_limit=3,
73
+ load_best_model_at_end=True,
74
+ auto_find_batch_size=True,
75
+ )
76
+
77
+
78
+ ## Evaluation
79
+
80
+ <!-- This section describes the evaluation protocols and provides the results. -->
81
+
82
+ ### Testing Data, Factors & Metrics
83
+
84
+ #### Testing Data
85
+
86
+ <!-- This should link to a Data Card if possible. -->
87
+
88
+ The model's performance is tested on three test sets, from which two are subsets of the [CONAN](https://github.com/marcoguerini/CONAN) dataset and one is the sexist portion of the [EDOS](https://github.com/rewire-online/edos) dataset
89
+
90
+ #### Metrics
91
+
92
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
93
+
94
+ The model's performance is tested on a custom evaluation pipeline for counter-speech generation. The pipeline includes CoLA, Toxicity, Hatefulness, Offensiveness, Label and Context Similarity, Validity as Counter-Speech, Repetition Rate, target-demographic F1 and the Arithmetic Mean
95
+
96
+
97
+ ### Results
98
+ CONAN
99
+ | Model Name | CoLA |TOX | Hate | OFF | L Sim | C Sim | VaCS | RR | F1 | AM |
100
+ | ---------- | ---- | -- | ---- | --- | ----- | ----- | ---- | -- | -- | -- |
101
+ | Human | 0.937 | 0.955 | 1.000 | 0.997 | - | 0.751 | 0.980 | 0.861 | 0.885 | 0.929 |
102
+ | target-aware gpt2-medium | 0.958 | 0.946 | 1.000 | 0.996 | 0.706 | 0.784 | 0.946 | 0.419 | 0.880 | 0.848 |
103
+
104
+ CONAN SMALL
105
+ | Model Name | CoLA |TOX | Hate | OFF | L Sim | C Sim | VaCS | RR | F1 | AM |
106
+ | ---------- | ---- | -- | ---- | --- | ----- | ----- | ---- | -- | -- | -- |
107
+ | Human | 0.963 | 0.956 | 1.000 | 1.000 | 1.000 | 0.768 | 0.988 | 0.995 | 0.868 | 0.949 |
108
+ | target-aware gpt2-medium | 0.975 | 0.931 | 1.000 | 1.000 | 0.728 | 0.783 | 0.888 | 0.911 | 0.792 | 0.890 |
109
+
110
+ EDOS
111
+ | Model Name | CoLA |TOX | Hate | OFF | C Sim | VaCS | RR | F1 | AM |
112
+ | ---------- | ---- | -- | ---- | --- | ----- | ---- | -- | -- | -- |
113
+ | target-aware gpt2-medium | 0.930 | 0.815 | 0.999 | 0.975 | 0.689 | 0.857 | 0.518 | 0.747 | 0.816|