victoriadreis commited on
Commit
98fb301
·
1 Parent(s): 05d5905

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - Silly-Machine/TuPyE-Dataset
5
+ language:
6
+ - pt
7
+
8
+ pipeline_tag: text-classification
9
+ base_model: neuralmind/bert-base-portuguese-cased
10
+ widget:
11
+ - text: 'Bom dia, flor do dia!!'
12
+
13
+ model-index:
14
+ - name: Yi-34B
15
+ results:
16
+ - task:
17
+ type: text-classfication
18
+ dataset:
19
+ name: TuPyE-Dataset
20
+ type: Silly-Machine/TuPyE-Dataset
21
+ metrics:
22
+ - type: f1
23
+ value: 0.84
24
+ name: F1-score
25
+ verified: true
26
+ - type: precision
27
+ value: 0.85
28
+ name: Precision
29
+ verified: true
30
+ - type: recall
31
+ value: 0.84
32
+ name: Recall
33
+ verified: true
34
+ ---
35
+
36
+ ## Introduction
37
+
38
+
39
+ Tupy-BERT-Base-Multilabel is a fine-tuned BERT model designed specifically for multilabel classification of hate speech in Portuguese.
40
+ Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-base-portuguese-cased),
41
+ TuPy-Base is a refined solution for addressing categorical hate speech concerns (ageism, aporophobia, body shame, capacitism, LGBTphobia, political,
42
+ racism, religious intolerance, misogyny, and xenophobia).
43
+ For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).
44
+
45
+ The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data.
46
+ In the creation of a specialized Portuguese Language Model tailored for hate speech classification,
47
+ the original BERTimbau model underwent fine-tuning processe carried out on
48
+ the [TuPy Hate Speech DataSet](https://huggingface.co/datasets/Silly-Machine/TuPyE-Dataset), sourced from diverse social networks.
49
+
50
+ ## Available models
51
+
52
+ | Model | Arch. | #Layers | #Params |
53
+ | ---------------------------------------- | ---------- | ------- | ------- |
54
+ | `Silly-Machine/TuPy-Bert-Base-Binary-Classifier` | BERT-Base |12 |109M|
55
+ | `Silly-Machine/TuPy-Bert-Large-Binary-Classifier` | BERT-Large | 24 | 334M |
56
+ | `Silly-Machine/TuPy-Bert-Base-Multilabel` | BERT-Base | 12 | 109M |
57
+ | `Silly-Machine/TuPy-Bert-Large-Multilabel` | BERT-Large | 24 | 334M |
58
+
59
+ ## Example usage
60
+
61
+ ```python
62
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
63
+ import torch
64
+ import numpy as np
65
+ from scipy.special import softmax
66
+
67
+ def classify_hate_speech(model_name, text):
68
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
69
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
70
+ config = AutoConfig.from_pretrained(model_name)
71
+
72
+ # Tokenize input text and prepare model input
73
+ model_input = tokenizer(text, padding=True, return_tensors="pt")
74
+
75
+ # Get model output scores
76
+ with torch.no_grad():
77
+ output = model(**model_input)
78
+ scores = softmax(output.logits.numpy(), axis=1)
79
+ ranking = np.argsort(scores[0])[::-1]
80
+
81
+ # Print the results
82
+ for i, rank in enumerate(ranking):
83
+ label = config.id2label[rank]
84
+ score = scores[0, rank]
85
+ print(f"{i + 1}) Label: {label} Score: {score:.4f}")
86
+
87
+ # Example usage
88
+ model_name = "Silly-Machine/TuPy-Bert-Base-Multilabel"
89
+ text = "Bom dia, flor do dia!!"
90
+ classify_hate_speech(model_name, text)
91
+
92
+ ```