File size: 4,905 Bytes
d02c985 724a72c d02c985 4bdda11 d02c985 a68f96b d02c985 9a89a05 d02c985 9a89a05 d02c985 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
---
license: apache-2.0
datasets:
- ifmain/text-moderation-410K
language:
- en
metrics:
- accuracy
pipeline_tag: text-classification
---
# ModerationBERT-ML-En
**ModerationBERT-ML-En** is a moderation model based on `bert-base-multilingual-cased`. This model is designed to perform text moderation tasks, specifically categorizing text into 18 different categories. It currently works only with English text.
[Check out the new version of the model! Even more accurate and better!](https://huggingface.co/ifmain/open-text-moderation-7)
## Dataset
The model was trained and fine-tuned using the [text-moderation-410K](https://huggingface.co/datasets/ifmain/text-moderation-410K) dataset. This dataset contains a wide variety of text samples labeled with different moderation categories.
## Model Description
ModerationBERT-ML-En uses the BERT architecture to classify text into the following categories:
- harassment
- harassment_threatening
- hate
- hate_threatening
- self_harm
- self_harm_instructions
- self_harm_intent
- sexual
- sexual_minors
- violence
- violence_graphic
- self-harm
- sexual/minors
- hate/threatening
- violence/graphic
- self-harm/intent
- self-harm/instructions
- harassment/threatening
## Training and Fine-Tuning
The model was trained using a 95% subset of the dataset for training and a 5% subset for evaluation. The training was performed in two stages:
1. **Initial Training**: The classifier layer was trained with frozen BERT layers.
2. **Fine-Tuning**: The top two layers of the BERT model were unfrozen and the entire model was fine-tuned.
## Installation
To use ModerationBERT-ML-En, you will need to install the `transformers` library from Hugging Face and `torch`.
```bash
pip install transformers torch
```
## Usage
Here is an example of how to use ModerationBERT-ML-En to predict the moderation categories for a given text:
```python
import json
import torch
from transformers import BertTokenizer, BertForSequenceClassification
# Load the tokenizer and model
model_name = "ifmain/ModerationBERT-ML-En"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=18)
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
def predict(text, model, tokenizer):
encoding = tokenizer.encode_plus(
text,
add_special_tokens=True,
max_length=128,
return_token_type_ids=False,
padding='max_length',
truncation=True,
return_attention_mask=True,
return_tensors='pt'
)
input_ids = encoding['input_ids'].to(device)
attention_mask = encoding['attention_mask'].to(device)
model.eval()
with torch.no_grad():
outputs = model(input_ids, attention_mask=attention_mask)
predictions = torch.sigmoid(outputs.logits) # Convert logits to probabilities
return predictions
# Example usage
new_text = "Fuck off stuped trash"
predictions = predict(new_text, model, tokenizer)
# Define the categories
categories = ['harassment', 'harassment_threatening', 'hate', 'hate_threatening',
'self_harm', 'self_harm_instructions', 'self_harm_intent', 'sexual',
'sexual_minors', 'violence', 'violence_graphic', 'self-harm',
'sexual/minors', 'hate/threatening', 'violence/graphic',
'self-harm/intent', 'self-harm/instructions', 'harassment/threatening']
# Convert predictions to a dictionary
category_scores = {categories[i]: predictions[0][i].item() for i in range(len(categories))}
output = {
"text": new_text,
"category_scores": category_scores
}
# Print the result as a JSON with indentation
print(json.dumps(output, indent=4, ensure_ascii=False))
```
Output:
```json
{
"text": "Fuck off stuped trash",
"category_scores": {
"harassment": 0.9272650480270386,
"harassment_threatening": 0.0013139015063643456,
"hate": 0.011709265410900116,
"hate_threatening": 1.1083522622357123e-05,
"self_harm": 0.00039102151640690863,
"self_harm_instructions": 0.0002464024000801146,
"self_harm_intent": 0.00031603744719177485,
"sexual": 0.020730027928948402,
"sexual_minors": 0.00018848323088604957,
"violence": 0.008375612087547779,
"violence_graphic": 2.8763401132891886e-05,
"self-harm": 0.00043840022408403456,
"sexual/minors": 0.00018241720681544393,
"hate/threatening": 1.1130881830467843e-05,
"violence/graphic": 2.7211604901822284e-05,
"self-harm/intent": 0.00026327319210395217,
"self-harm/instructions": 0.00023905260604806244,
"harassment/threatening": 0.0012845908058807254
}
}
```
## Notes
- This model is currently configured to work only with English text.
- Future updates may include support for additional languages. |