PleIAs
/

celadon

Text Classification

Safetensors

deberta-v2

Model card Files Files and versions Community

eliotj commited on 17 days ago

Commit

b32bfa0

•

1 Parent(s): 6ec6eb0

Update README.md

Browse files

Files changed (1) hide show

README.md +25 -3

README.md CHANGED Viewed

@@ -18,8 +18,8 @@ pipeline_tag: text-classification
 [Pleias](https://huggingface.co/PleIAs)
-DeBERTa-v3-small model
-trained on 600k samples from Toxic Commons
 This classifier is primarily aimed at historical cultural heritage data, like that in [Common Corpus]
@@ -32,11 +32,33 @@ Five types of toxicity classification:
 Read more about the training details in the paper, [Toxicity of the Commons: Curating Open-Source Pre-Training Data] by [Catherine Arnett](https://huggingface.co/catherinearnett), [Eliot Jones](https://huggingface.co/eliotj), Ivan P. Yamshchikov, [Pierre-Carl Langlais](https://huggingface.co/Pclanglais).
-Code for generating the annotations and training the model is available on [GitHub](https://github.com/eliotjones1/celadon)
 # How to Use
 # How to Cite
 ```

 [Pleias](https://huggingface.co/PleIAs)
+Celadon is a DeBERTa-v3-small finetune with five classification heads, trained on 600k samples from Toxic Commons.
 This classifier is primarily aimed at historical cultural heritage data, like that in [Common Corpus]
 Read more about the training details in the paper, [Toxicity of the Commons: Curating Open-Source Pre-Training Data] by [Catherine Arnett](https://huggingface.co/catherinearnett), [Eliot Jones](https://huggingface.co/eliotj), Ivan P. Yamshchikov, [Pierre-Carl Langlais](https://huggingface.co/Pclanglais).
+For more detailed code regarding generating the annotations in Toxic Commons, training the model, and using the model, please refer to the official [GitHub](https://github.com/eliotjones1/celadon) repository.
 # How to Use
+```
+from transformers import AutoTokenizer
+from model import MultiHeadDebertaForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained("PleIAs/celadon")
+model = MultiHeadDebertaForSequenceClassification.from_pretrained("PleIAs/celadon")
+model.eval()
+sample_text = "This is an example of a normal sentence"
+inputs = tokenizer(sample_text, return_tensors="pt", padding=True, truncation=True)
+outputs = model(**inputs)
+categories = ['Race/Origin', 'Gender/Sex', 'Religion', 'Ability', 'Violence']
+predictions = outputs.argmax(dim=-1).squeeze().tolist()[0]
+# Print the classification results for each category
+print(f"Text: {sample_text}")
+for i, category in enumerate(categories):
+    print(f"Prediction for Category {category}: {predictions[i]}")
+```
 # How to Cite
 ```