File size: 3,155 Bytes
17169a4 b943ffc 9036f6c b943ffc efdb120 b943ffc efdb120 b943ffc efdb120 b943ffc efdb120 b943ffc 17169a4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
license: mit
language:
- en
---
# Named entity recognition
## Model Description
This model is a fine-tuned token classification model designed to predict entities in sentences.
It's fine-tuned on a custom dataset that focuses on identifying certain types of entities, including biases in text.
## Intended Use
The model is intended to be used for entity recognition tasks, especially for identifying biases in text passages.
Users can input a sequence of text, and the model will highlight words or tokens or **spans** it believes are associated with a particular entity or bias.
## How to Use
The model can be used for inference directly through the Hugging Face `transformers` library:
```python
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("newsmediabias/UnBIAS-Named-Entity-Recognition")
model = AutoModelForTokenClassification.from_pretrained("newsmediabias/UnBIAS-Named-Entity-Recognition")
def predict_entities(sentence):
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sentence)))
inputs = tokenizer.encode(sentence, return_tensors="pt")
inputs = inputs.to(device)
outputs = model(inputs).logits
predictions = torch.argmax(outputs, dim=2)
id2label = model.config.id2label
# Reconstruct words from subword tokens
biased_words = []
current_word = ""
for token, prediction in zip(tokens, predictions[0]):
label = id2label[prediction.item()]
if label in ['B-BIAS', 'I-BIAS']:
if token.startswith('##'):
current_word += token[2:]
else:
if current_word:
biased_words.append(current_word)
current_word = token
if current_word:
biased_words.append(current_word)
# Filter out special tokens and subword tokens
biased_words = [word for word in biased_words if not word.startswith('[') and not word.endswith(']') and not word.startswith('##')]
return biased_words
sentence = "due to your evil and dishonest nature, i am kind of tired and want to get rid of such cheapters."
predictions = predict_entities(sentence)
biased_words = predict_entities(sentence)
for word in biased_words:
print(f"Biased Word: {word}")
```
## Limitations and Biases
Every model has limitations, and it's crucial to understand these when deploying models in real-world scenarios:
1. **Training Data**: The model is trained on a specific dataset, and its predictions are only as good as the data it's trained on.
2. **Generalization**: While the model may perform well on certain types of sentences or phrases, it might not generalize well to all types of text or contexts.
It's also essential to be aware of any potential biases in the training data, which might affect the model's predictions.
## Training Data
The model was fine-tuned on a custom dataset. Ask **Shaina Raza [email protected]** for dataset |