Spaces:
Running
Running
File size: 2,787 Bytes
8896a07 283cc59 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
---
title: README
emoji: 🏃
colorFrom: indigo
colorTo: pink
sdk: static
pinned: false
---
# hmBERT 64k
Historical Multilingual Language Models for Named Entity Recognition. The following languages are covered by hmBERT:
* English (British Library Corpus - Books)
* German (Europeana Newspaper)
* French (Europeana Newspaper)
* Finnish (Europeana Newspaper)
* Swedish (Europeana Newspaper)
More details can be found in [our GitHub repository](https://github.com/dbmdz/clef-hipe) and in our
[hmBERT paper](https://ceur-ws.org/Vol-3180/paper-87.pdf).
<div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400">
<p>
The hmBERT 64k model is a 12-layer BERT model with a 64k vocab.
</p>
</div>
# Leaderboard
We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana.
The following table shows an overview of used datasets:
| Language | Datasets |
|----------|------------------------------------------------------------------|
| English | [AjMC] - [TopRes19th] |
| German | [AjMC] - [NewsEye] - [HIPE-2020] |
| French | [AjMC] - [ICDAR-Europeana] - [LeTemps] - [NewsEye] - [HIPE-2020] |
| Finnish | [NewsEye] |
| Swedish | [NewsEye] |
| Dutch | [ICDAR-Europeana] |
[AjMC]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md
[NewsEye]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-newseye.md
[TopRes19th]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-topres19th.md
[ICDAR-Europeana]: https://github.com/stefan-it/historic-domain-adaptation-icdar
[LeTemps]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-letemps.md
[HIPE-2020]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md
All results can be found in the [`hmLeaderboard`](https://huggingface.co/spaces/hmbench/hmLeaderboard).
# Acknowledgements
We thank [Luisa März](https://github.com/LuisaMaerz), [Katharina Schmid](https://github.com/schmika) and
[Erion Çano](https://github.com/erionc) for their fruitful discussions about Historical Language Models.
Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
Many Thanks for providing access to the TPUs ❤️
|