---
license: openrail
datasets:
- shainaraza/clinical_bias
language:
- en
metrics:
- f1
- accuracy
---
language: en
tags:

NER
Named Entity Recognition
Bias
Clinical
Healthcare
license: Apache-2.0
model-index:
huggingface: shainaraza/clinical-bias-ner
datasets:
shainaraza/clinical_bias
metrics:

f1-score

## Clinical Bias NER Model
This is a Named Entity Recognition (NER) model trained on clinical text data to detect biased language. The model identifies named entities in text, specifically mentions of patient groups and conditions, and marks them as potentially biased.

## Model Details
The model was trained on the clinical notes dataset using the distilbert-base-uncased transformer model. It was fine-tuned for 3 epochs using a batch size of 8 on Google Colab.

The model is capable of identifying named entities with two labels - O (for non-biased words) and BIAS (for potentially biased words). The BIAS label is annotated manually by looking into each record and finding which sentence has bias.

## Performance
The model achieved an F1-score of 0.93 on the validation set of the dataset.

## Usage
The model can be used to identify potentially biased language in clinical text data. It can be integrated into a larger NLP pipeline or used as a standalone tool.

To use the model, simply import the AutoModelForTokenClassification and AutoTokenizer classes from the transformers library, and load the model and tokenizer using the from_pretrained() method.

```
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
from prettytable import PrettyTable

# Load the model and tokenizer from the Hugging Face model hub
model_name = "shainaraza/clinical-bias-ner"
model = AutoModelForTokenClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Define the text to classify
text = "The patient is a 50-year poor, take drugs and has aggressive behavior."

# Tokenize the text
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(text)))

# Convert tokens to input IDs
input_ids = tokenizer.convert_tokens_to_ids(tokens)

# Generate attention masks
attention_masks = [1] * len(input_ids)

# Prepare the input tensors
input_ids = torch.tensor(input_ids).unsqueeze(0)
attention_masks = torch.tensor(attention_masks).unsqueeze(0)

# Run the model and get the predicted labels
with torch.no_grad():
    outputs = model(input_ids, attention_masks)
    predicted_labels = torch.argmax(outputs[0], dim=2)

# Convert predicted labels back to text
predicted_labels = predicted_labels.squeeze().tolist()
predicted_labels = [model.config.id2label[label_id] for label_id in predicted_labels]
predicted_text = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(tokenizer.encode(text)))
predicted_text_with_labels = ""
for i, token in enumerate(tokens):
    predicted_text_with_labels += f"{token}/{predicted_labels[i]} "

# Display the predicted labels in a table
table = PrettyTable(['Token', 'Label'])
for i, token in enumerate(tokens):
    table.add_row([token, predicted_labels[i]])
print(predicted_text)
print(table)


```

This will output
```
+------------+-------+
|   Token    | Label |
+------------+-------+
|   [CLS]    |   O   |
|   [UNK]    |   O   |
|  patient   |   O   |
|     is     |   O   |
|     a      |   O   |
|     50     |   O   |
|     -      |   O   |
|    year    |   O   |
|    poor    |  BIAS |
|     ,      |   O   |
|    take    |   O   |
|   drugs    |   O   |
|    and     |   O   |
|    has     |   O   |
| aggressive |  BIAS |
|  behavior  |   O   |
|     .      |   O   |
|   [SEP]    |   O   |
+------------+-------+

```
## Limitations and Future Work
The model is not perfect and may not capture all instances of biased language. It is important to note that the model only identifies potentially biased language and does not make any judgments on intent or impact.

In future work, the model could be fine-tuned on a larger and more diverse dataset to improve its performance. Additionally, the model could be extended to identify other types of biased language, such as ageism, racism, or sexism.

## Acknowledgments
This model was developed by Shaina Raza as part of her project.

## Contact
For any questions or comments, please contact shaina.raza@torontomu.ca