LitBERT-CRF
LitBERT-CRF model is a fine-tuned BERT-CRF architecture specifically designed for Named Entity Recognition (NER) in Portuguese-written literature.
Model Details
Model Description
LitBERT-CRF leverages a BERT-CRF architecture, initially pre-trained on the brWaC corpus and fine-tuned on the HAREM dataset for enhanced NER performance in Portuguese. It incorporates domain-specific literary data through Masked Language Modeling (MLM), making it well-suited for identifying named entities in literary texts.
- Model type: BERT-CRF for NER
- Language: Portuguese
- Fine-tuned from model: BERT-CRF on brWaC and HAREM
Evaluation
Testing Data, Factors & Metrics
Testing Data
PPORTAL_ner dataset
Metrics
- Precision: 0.783
- Recall: 0.774
- F1-score: 0.779
Citation
BibTeX:
@inproceedings{silva-moro-2024-evaluating,
title = "Evaluating Pre-training Strategies for Literary Named Entity Recognition in {P}ortuguese",
author = "Silva, Mariana O. and
Moro, Mirella M.",
editor = "Gamallo, Pablo and
Claro, Daniela and
Teixeira, Ant{\'o}nio and
Real, Livy and
Garcia, Marcos and
Oliveira, Hugo Gon{\c{c}}alo and
Amaro, Raquel",
booktitle = "Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1",
month = mar,
year = "2024",
address = "Santiago de Compostela, Galicia/Spain",
publisher = "Association for Computational Lingustics",
url = "https://aclanthology.org/2024.propor-1.39",
pages = "384--393",
}
APA:
Mariana O. Silva and Mirella M. Moro. 2024. Evaluating Pre-training Strategies for Literary Named Entity Recognition in Portuguese. In Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 384–393, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.
- Downloads last month
- 581