Spaces:
Running
Running
title: README | |
emoji: 🏃 | |
colorFrom: indigo | |
colorTo: pink | |
sdk: static | |
pinned: false | |
# hmBERT 64k | |
Historical Multilingual Language Models for Named Entity Recognition. The following languages are covered by hmBERT: | |
* English (British Library Corpus - Books) | |
* German (Europeana Newspaper) | |
* French (Europeana Newspaper) | |
* Finnish (Europeana Newspaper) | |
* Swedish (Europeana Newspaper) | |
More details can be found in [our GitHub repository](https://github.com/dbmdz/clef-hipe) and in our | |
[hmBERT paper](https://ceur-ws.org/Vol-3180/paper-87.pdf). | |
<div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400"> | |
<p> | |
The hmBERT 64k model is a 12-layer BERT model with a 64k vocab. | |
</p> | |
</div> | |
# Leaderboard | |
We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana. | |
The following table shows an overview of used datasets: | |
| Language | Datasets | | |
|----------|------------------------------------------------------------------| | |
| English | [AjMC] - [TopRes19th] | | |
| German | [AjMC] - [NewsEye] - [HIPE-2020] | | |
| French | [AjMC] - [ICDAR-Europeana] - [LeTemps] - [NewsEye] - [HIPE-2020] | | |
| Finnish | [NewsEye] | | |
| Swedish | [NewsEye] | | |
| Dutch | [ICDAR-Europeana] | | |
[AjMC]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md | |
[NewsEye]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-newseye.md | |
[TopRes19th]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-topres19th.md | |
[ICDAR-Europeana]: https://github.com/stefan-it/historic-domain-adaptation-icdar | |
[LeTemps]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-letemps.md | |
[HIPE-2020]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md | |
All results can be found in the [`hmLeaderboard`](https://huggingface.co/spaces/hmbench/hmLeaderboard). | |
# Acknowledgements | |
We thank [Luisa März](https://github.com/LuisaMaerz), [Katharina Schmid](https://github.com/schmika) and | |
[Erion Çano](https://github.com/erionc) for their fruitful discussions about Historical Language Models. | |
Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC). | |
Many Thanks for providing access to the TPUs ❤️ | |