File size: 1,150 Bytes
9e6e707
d9b470b
9e6e707
8854442
9e6e707
d9b470b
8854442
d9b470b
220d239
0f20942
9e6e707
220d239
 
 
efc4868
220d239
 
efc4868
 
 
 
 
 
423ef16
efc4868
 
 
cd7f345
efc4868
cd7f345
efc4868
4991afa
13fff88
4991afa
cd7f345
13fff88
4991afa
efc4868
 
220d239
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

---
language: 
  - en
license: "apache-2.0"
datasets:
- CoNLL-2003
metrics:
- F1

---

This is a T5 small model finetuned on CoNLL-2003 dataset for named entity recognition (NER).

Example Input and Output:
“Recognize all the named entities in this sequence (replace named entities with one of [PER], [ORG], [LOC], [MISC]): When Alice visited New York” → “When PER visited LOC LOC"

Evaluation Result:

% of match (for comparison with ExT5: https://arxiv.org/pdf/2111.10952.pdf): 

| Model| ExT5_{Base} | This Model | T5_NER_CONLL_OUTPUTLIST 
| :---: | :---: | :---: | :---: |
| % of Complete Match| 86.53 | 79.03 | TBA| 



There are some outputs (212/3453 or 6.14% that does not have the same length as the input)

F1 score on testing set of those with matching length :

| Model | This Model | T5_NER_CONLL_OUTPUTLIST | BERTbase 
| :---: | :---: | :---: | :---: |
| F1| 0.8901 | 0.8691| 0.9240

**Caveat: The testing set of these aren't the same, due to matching length issue... 
T5_NER_CONLL_OUTPUTLIST only has 27/3453 missing length (only 0.78%); The BERT number is directly from their paper (https://arxiv.org/pdf/1810.04805.pdf)