Papers
arxiv:2412.09587

OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages

Published on Dec 12
Authors:
,
,
,

Abstract

We present OpenNER 1.0, a standardized collection of openly available named entity recognition (NER) datasets. OpenNER contains 34 datasets spanning 51 languages, annotated in varying named entity ontologies. We correct annotation format issues, standardize the original datasets into a uniform representation, map entity type names to be more consistent across corpora, and provide the collection in a structure that enables research in multilingual and multi-ontology NER. We provide baseline models using three pretrained multilingual language models to compare the performance of recent models and facilitate future research in NER.

Community

Hi @lignos , I really like this effort of having standardized NER datasets across different languages :) Do you plan to release the final dataset in near future 🤔

Paper author

Thanks for asking, and sorry about there not being a dataset right now. We had to preprint this a little ahead of time so it can be cited by another paper, so the paper is slightly ahead of the data. We're going to put up a new preprint next week along with the dataset release and a GitHub repo.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.09587 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.09587 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.09587 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.