File size: 1,150 Bytes
9e6e707 d9b470b 9e6e707 8854442 9e6e707 d9b470b 8854442 d9b470b 220d239 0f20942 9e6e707 220d239 efc4868 220d239 efc4868 423ef16 efc4868 cd7f345 efc4868 cd7f345 efc4868 4991afa 13fff88 4991afa cd7f345 13fff88 4991afa efc4868 220d239 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
language:
- en
license: "apache-2.0"
datasets:
- CoNLL-2003
metrics:
- F1
---
This is a T5 small model finetuned on CoNLL-2003 dataset for named entity recognition (NER).
Example Input and Output:
“Recognize all the named entities in this sequence (replace named entities with one of [PER], [ORG], [LOC], [MISC]): When Alice visited New York” → “When PER visited LOC LOC"
Evaluation Result:
% of match (for comparison with ExT5: https://arxiv.org/pdf/2111.10952.pdf):
| Model| ExT5_{Base} | This Model | T5_NER_CONLL_OUTPUTLIST
| :---: | :---: | :---: | :---: |
| % of Complete Match| 86.53 | 79.03 | TBA|
There are some outputs (212/3453 or 6.14% that does not have the same length as the input)
F1 score on testing set of those with matching length :
| Model | This Model | T5_NER_CONLL_OUTPUTLIST | BERTbase
| :---: | :---: | :---: | :---: |
| F1| 0.8901 | 0.8691| 0.9240
**Caveat: The testing set of these aren't the same, due to matching length issue...
T5_NER_CONLL_OUTPUTLIST only has 27/3453 missing length (only 0.78%); The BERT number is directly from their paper (https://arxiv.org/pdf/1810.04805.pdf)
|