File size: 2,787 Bytes
8896a07
 
 
 
 
 
 
 
 
283cc59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
title: README
emoji: 🏃
colorFrom: indigo
colorTo: pink
sdk: static
pinned: false
---

# hmBERT 64k

Historical Multilingual Language Models for Named Entity Recognition. The following languages are covered by hmBERT:

* English (British Library Corpus - Books)
* German (Europeana Newspaper)
* French (Europeana Newspaper)
* Finnish (Europeana Newspaper)
* Swedish (Europeana Newspaper)

More details can be found in [our GitHub repository](https://github.com/dbmdz/clef-hipe) and in our
[hmBERT paper](https://ceur-ws.org/Vol-3180/paper-87.pdf).

<div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400">
<p>
  The hmBERT 64k model is a 12-layer BERT model with a 64k vocab.
</p>
</div>

# Leaderboard

We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana.
The following table shows an overview of used datasets:

| Language | Datasets                                                         |
|----------|------------------------------------------------------------------|
| English  | [AjMC] - [TopRes19th]                                            |
| German   | [AjMC] - [NewsEye] - [HIPE-2020]                                 |
| French   | [AjMC] - [ICDAR-Europeana] - [LeTemps] - [NewsEye] - [HIPE-2020] |
| Finnish  | [NewsEye]                                                        |
| Swedish  | [NewsEye]                                                        |
| Dutch    | [ICDAR-Europeana]                                                |

[AjMC]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md
[NewsEye]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-newseye.md
[TopRes19th]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-topres19th.md
[ICDAR-Europeana]: https://github.com/stefan-it/historic-domain-adaptation-icdar
[LeTemps]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-letemps.md
[HIPE-2020]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md

All results can be found in the [`hmLeaderboard`](https://huggingface.co/spaces/hmbench/hmLeaderboard).

# Acknowledgements

We thank [Luisa März](https://github.com/LuisaMaerz), [Katharina Schmid](https://github.com/schmika) and
[Erion Çano](https://github.com/erionc) for their fruitful discussions about Historical Language Models.

Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
Many Thanks for providing access to the TPUs ❤️