File size: 3,832 Bytes
8f46f3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
license: apache-2.0
datasets:
- eriktks/conll2003
language:
- en
metrics:
- accuracy
base_model:
- google-bert/bert-base-cased
pipeline_tag: token-classification
library_name: transformers
tags:
- code
---
# BERT Fine-Tuned for Named Entity Recognition (NER)

This repository contains a BERT model fine-tuned for Named Entity Recognition (NER) tasks. The model was fine-tuned using the Hugging Face `transformers` library and is capable of recognizing named entities like people, locations, organizations, and more from text.

## Model Details

- **Model Architecture**: `BERT-base`
- **Fine-Tuning Task**: Named Entity Recognition (NER)
- **Dataset Used**: This model was fine-tuned on the [CoNLL-2003](https://www.aclweb.org/anthology/W03-0419) NER dataset, which includes labeled data for entities such as persons, organizations, locations, and miscellaneous.
- **Intended Use**: The model is suitable for NER tasks in various applications, including information extraction, question answering, and chatbots.

## Usage

You can use this model with the Hugging Face `transformers` library to quickly get started with NER tasks. Below is an example of how to load and use this model for inference.

### Installation

First, make sure you have the required packages:

```bash
pip install transformers
```

### Loading the Model
```
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("heenamir/bert-finetuned-ner")
model = AutoModelForTokenClassification.from_pretrained("heenamir/bert-finetuned-ner")

# Initialize the NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer)

# Example text
text = "John Doe is a software engineer at OpenAI in San Francisco."

# Perform NER
entities = nlp(text)
print(entities)
```

### Example Output
The model will return a list of entities in the following format:
```
[
  {"entity": "B-PER", "score": 0.99, "index": 1, "word": "John", "start": 0, "end": 4},
  {"entity": "I-PER", "score": 0.98, "index": 2, "word": "Doe", "start": 5, "end": 8},
  {"entity": "B-ORG", "score": 0.95, "index": 7, "word": "OpenAI", "start": 28, "end": 34},
  {"entity": "B-LOC", "score": 0.97, "index": 10, "word": "San Francisco", "start": 38, "end": 51},
]
```

### Entity Labels

The model is fine-tuned to detect the following entity types:

* **PER**: Person
* **ORG**: Organization
* **LOC**: Location
* **MISC**: Miscellaneous

### Scoring

The model outputs a score for each detected entity, representing its confidence level. You can use these scores to filter out low-confidence predictions if needed.

## Model Performance

The model's performance can vary depending on the complexity and context of the input text. It performs well on structured text but may struggle with informal or highly technical language.

### Evaluation Metrics

The model was evaluated on the CoNLL-2003 test set with the following metrics:

* **Precision**: 93.04%
* **Recall**: 94.98%
* **F1 Score**: 94%

## Limitations and Considerations

* The model may not perform well on texts outside of the domains it was trained on.
* Like all NER models, it may occasionally misclassify entities or fail to recognize them, especially in cases of polysemy or ambiguity.
* It is also limited to English text, as it was fine-tuned on an English dataset.

## Credits

* Fine-tuning and Model: [Heena Mirchandani](https://huggingface.co/heenamir) & [Krish Murjani](https://huggingface.co/krishmurjani)
* Dataset: CoNLL-2003 NER dataset

## License

This model is available for use under the Apache License 2.0. See the LICENSE file for more details.

---

For more details on BERT and Named Entity Recognition, refer to the [Hugging Face documentation](https://huggingface.co/docs/transformers).