Text Classification
Safetensors
deberta-v2
eliotj commited on
Commit
b32bfa0
1 Parent(s): 6ec6eb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -3
README.md CHANGED
@@ -18,8 +18,8 @@ pipeline_tag: text-classification
18
 
19
  [Pleias](https://huggingface.co/PleIAs)
20
 
21
- DeBERTa-v3-small model
22
- trained on 600k samples from Toxic Commons
23
 
24
  This classifier is primarily aimed at historical cultural heritage data, like that in [Common Corpus]
25
 
@@ -32,11 +32,33 @@ Five types of toxicity classification:
32
 
33
 
34
  Read more about the training details in the paper, [Toxicity of the Commons: Curating Open-Source Pre-Training Data] by [Catherine Arnett](https://huggingface.co/catherinearnett), [Eliot Jones](https://huggingface.co/eliotj), Ivan P. Yamshchikov, [Pierre-Carl Langlais](https://huggingface.co/Pclanglais).
35
- Code for generating the annotations and training the model is available on [GitHub](https://github.com/eliotjones1/celadon)
36
 
37
 
38
  # How to Use
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  # How to Cite
41
 
42
  ```
 
18
 
19
  [Pleias](https://huggingface.co/PleIAs)
20
 
21
+
22
+ Celadon is a DeBERTa-v3-small finetune with five classification heads, trained on 600k samples from Toxic Commons.
23
 
24
  This classifier is primarily aimed at historical cultural heritage data, like that in [Common Corpus]
25
 
 
32
 
33
 
34
  Read more about the training details in the paper, [Toxicity of the Commons: Curating Open-Source Pre-Training Data] by [Catherine Arnett](https://huggingface.co/catherinearnett), [Eliot Jones](https://huggingface.co/eliotj), Ivan P. Yamshchikov, [Pierre-Carl Langlais](https://huggingface.co/Pclanglais).
35
+ For more detailed code regarding generating the annotations in Toxic Commons, training the model, and using the model, please refer to the official [GitHub](https://github.com/eliotjones1/celadon) repository.
36
 
37
 
38
  # How to Use
39
 
40
+ ```
41
+ from transformers import AutoTokenizer
42
+ from model import MultiHeadDebertaForSequenceClassification
43
+
44
+ tokenizer = AutoTokenizer.from_pretrained("PleIAs/celadon")
45
+ model = MultiHeadDebertaForSequenceClassification.from_pretrained("PleIAs/celadon")
46
+ model.eval()
47
+
48
+ sample_text = "This is an example of a normal sentence"
49
+
50
+ inputs = tokenizer(sample_text, return_tensors="pt", padding=True, truncation=True)
51
+ outputs = model(**inputs)
52
+
53
+ categories = ['Race/Origin', 'Gender/Sex', 'Religion', 'Ability', 'Violence']
54
+ predictions = outputs.argmax(dim=-1).squeeze().tolist()[0]
55
+
56
+ # Print the classification results for each category
57
+ print(f"Text: {sample_text}")
58
+ for i, category in enumerate(categories):
59
+ print(f"Prediction for Category {category}: {predictions[i]}")
60
+ ```
61
+
62
  # How to Cite
63
 
64
  ```