maximuspowers commited on
Commit
c67ccda
·
verified ·
1 Parent(s): a66ce6b

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -3
README.md CHANGED
@@ -1,3 +1,84 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: token-classification
7
+ tags:
8
+ - Social Bias
9
+ metrics:
10
+ - name: F1
11
+ type: F1
12
+ value: 0.7864
13
+ - name: Recall
14
+ type: Recall
15
+ value: 0.7617
16
+ thumbnail: "https://media.licdn.com/dms/image/v2/D4E12AQH-g6TfVlad0g/article-cover_image-shrink_720_1280/article-cover_image-shrink_720_1280/0/1724391684857?e=1729728000&v=beta&t=e3ggmXGVKaVU6e72wjsc9Ppgd0rigQqjeA1Od9fyFDk"
17
+ base_model: "bert-base-uncased"
18
+ co2_eq_emissions:
19
+ emissions: 8
20
+ training_type: "fine-tuning"
21
+ geographical_location: "Phoenix, AZ"
22
+ hardware_used: "T4"
23
+ ---
24
+
25
+ # Social Bias NER
26
+
27
+ This NER model is fine-tuned from BERT, for *multi-label* token classification of:
28
+
29
+ - (GEN)eralizations
30
+ - (UNFAIR)ness
31
+ - (STEREO)types
32
+
33
+ You can [try it out in spaces](https://huggingface.co/spaces/maximuspowers/bias-detection-ner) :).
34
+
35
+ ## How to Get Started with the Model
36
+
37
+ Transformers pipeline doesn't have a class for multi-label token classification, but you can use this code to load the model, and run it, and format the output.
38
+
39
+ ```
40
+ import json
41
+ import torch
42
+ from transformers import BertTokenizerFast, BertForTokenClassification
43
+ import gradio as gr
44
+
45
+ # init important things
46
+ tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
47
+ model = BertForTokenClassification.from_pretrained('maximuspowers/bias-detection-ner')
48
+ model.eval()
49
+ model.to('cuda' if torch.cuda.is_available() else 'cpu')
50
+
51
+ # ids to labels we want to display
52
+ id2label = {
53
+ 0: 'O',
54
+ 1: 'B-STEREO',
55
+ 2: 'I-STEREO',
56
+ 3: 'B-GEN',
57
+ 4: 'I-GEN',
58
+ 5: 'B-UNFAIR',
59
+ 6: 'I-UNFAIR'
60
+ }
61
+
62
+ # predict function you'll want to use if using in your own code
63
+ def predict_ner_tags(sentence):
64
+ inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True, max_length=128)
65
+ input_ids = inputs['input_ids'].to(model.device)
66
+ attention_mask = inputs['attention_mask'].to(model.device)
67
+
68
+ with torch.no_grad():
69
+ outputs = model(input_ids=input_ids, attention_mask=attention_mask)
70
+ logits = outputs.logits
71
+ probabilities = torch.sigmoid(logits)
72
+ predicted_labels = (probabilities > 0.5).int() # remember to try your own threshold
73
+
74
+ result = []
75
+ tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
76
+ for i, token in enumerate(tokens):
77
+ if token not in tokenizer.all_special_tokens:
78
+ label_indices = (predicted_labels[0][i] == 1).nonzero(as_tuple=False).squeeze(-1)
79
+ labels = [id2label[idx.item()] for idx in label_indices] if label_indices.numel() > 0 else ['O']
80
+ result.append({"token": token, "labels": labels})
81
+
82
+ return json.dumps(result, indent=4)
83
+ ```
84
+