File size: 1,485 Bytes
05f68cf
2bbce4c
05f68cf
 
 
 
 
 
e2d4aec
 
819d66d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
inference: False
license: apache-2.0
language:
- pt
metrics:
- f1
pipeline_tag: token-classification
datasets:
- harem
---


# Portuguese NER BERT-CRF HAREM Default

This model is a fine-tuned BERT model adapted for Named Entity Recognition (NER) tasks. It utilizes Conditional Random Fields (CRF) as the decoder.

The model follows the HAREM Selective labeling scheme for NER. Additionally, it provides options for HAREM Default and Conll-2003 labeling schemes.

## How to Use

You can employ this model using the Transformers library's *pipeline* for NER, or incorporate it as a conventional Transformer in the HuggingFace ecosystem.

```python
from transformers import pipeline
import torch
import nltk

ner_classifier = pipeline(
    "ner",
    model="arubenruben/NER-PT-BERT-CRF-HAREM-Selective",
    device=torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu"),
    trust_remote_code=True
)

text = "FCPorto vence o Benfica por 5-0 no Estádio do Dragão"
tokens = nltk.wordpunct_tokenize(text)
result = ner_classifier(tokens)
```

## Demo

There is a [Notebook](https://github.com/arubenruben/PT-Pump-Up/blob/master/BERT-CRF.ipynb) available to test our code.

## PT-Pump-Up

This model is integrated in the project [PT-Pump-Up](https://github.com/arubenruben/PT-Pump-Up)

## Evaluation

#### Testing Data

The model was tested on the Miniharem Testset.

### Results

F1-Score: 0.832

## Citation

Citation will be made available soon.

**BibTeX:**
:(