|
--- |
|
inference: False |
|
license: apache-2.0 |
|
language: |
|
- pt |
|
metrics: |
|
- f1 |
|
pipeline_tag: token-classification |
|
datasets: |
|
- harem |
|
--- |
|
|
|
|
|
# Portuguese NER BERT-CRF HAREM Default |
|
|
|
This model is a fine-tuned BERT model adapted for Named Entity Recognition (NER) tasks. It utilizes Conditional Random Fields (CRF) as the decoder. |
|
|
|
The model follows the HAREM Selective labeling scheme for NER. Additionally, it provides options for HAREM Default and Conll-2003 labeling schemes. |
|
|
|
## How to Use |
|
|
|
You can employ this model using the Transformers library's *pipeline* for NER, or incorporate it as a conventional Transformer in the HuggingFace ecosystem. |
|
|
|
```python |
|
from transformers import pipeline |
|
import torch |
|
import nltk |
|
|
|
ner_classifier = pipeline( |
|
"ner", |
|
model="arubenruben/NER-PT-BERT-CRF-HAREM-Selective", |
|
device=torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu"), |
|
trust_remote_code=True |
|
) |
|
|
|
text = "FCPorto vence o Benfica por 5-0 no Estádio do Dragão" |
|
tokens = nltk.wordpunct_tokenize(text) |
|
result = ner_classifier(tokens) |
|
``` |
|
|
|
## Demo |
|
|
|
There is a [Notebook](https://github.com/arubenruben/PT-Pump-Up/blob/master/BERT-CRF.ipynb) available to test our code. |
|
|
|
## PT-Pump-Up |
|
|
|
This model is integrated in the project [PT-Pump-Up](https://github.com/arubenruben/PT-Pump-Up) |
|
|
|
## Evaluation |
|
|
|
#### Testing Data |
|
|
|
The model was tested on the Miniharem Testset. |
|
|
|
### Results |
|
|
|
F1-Score: 0.832 |
|
|
|
## Citation |
|
|
|
Citation will be made available soon. |
|
|
|
**BibTeX:** |
|
:( |