stefan-it commited on
Commit
283cc59
1 Parent(s): 8896a07

readme: add initial version of organization card \o/

Browse files
Files changed (1) hide show
  1. README.md +49 -1
README.md CHANGED
@@ -7,4 +7,52 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # hmBERT 64k
11
+
12
+ Historical Multilingual Language Models for Named Entity Recognition. The following languages are covered by hmBERT:
13
+
14
+ * English (British Library Corpus - Books)
15
+ * German (Europeana Newspaper)
16
+ * French (Europeana Newspaper)
17
+ * Finnish (Europeana Newspaper)
18
+ * Swedish (Europeana Newspaper)
19
+
20
+ More details can be found in [our GitHub repository](https://github.com/dbmdz/clef-hipe) and in our
21
+ [hmBERT paper](https://ceur-ws.org/Vol-3180/paper-87.pdf).
22
+
23
+ <div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400">
24
+ <p>
25
+ The hmBERT 64k model is a 12-layer BERT model with a 64k vocab.
26
+ </p>
27
+ </div>
28
+
29
+ # Leaderboard
30
+
31
+ We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana.
32
+ The following table shows an overview of used datasets:
33
+
34
+ | Language | Datasets |
35
+ |----------|------------------------------------------------------------------|
36
+ | English | [AjMC] - [TopRes19th] |
37
+ | German | [AjMC] - [NewsEye] - [HIPE-2020] |
38
+ | French | [AjMC] - [ICDAR-Europeana] - [LeTemps] - [NewsEye] - [HIPE-2020] |
39
+ | Finnish | [NewsEye] |
40
+ | Swedish | [NewsEye] |
41
+ | Dutch | [ICDAR-Europeana] |
42
+
43
+ [AjMC]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md
44
+ [NewsEye]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-newseye.md
45
+ [TopRes19th]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-topres19th.md
46
+ [ICDAR-Europeana]: https://github.com/stefan-it/historic-domain-adaptation-icdar
47
+ [LeTemps]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-letemps.md
48
+ [HIPE-2020]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md
49
+
50
+ All results can be found in the [`hmLeaderboard`](https://huggingface.co/spaces/hmbench/hmLeaderboard).
51
+
52
+ # Acknowledgements
53
+
54
+ We thank [Luisa März](https://github.com/LuisaMaerz), [Katharina Schmid](https://github.com/schmika) and
55
+ [Erion Çano](https://github.com/erionc) for their fruitful discussions about Historical Language Models.
56
+
57
+ Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
58
+ Many Thanks for providing access to the TPUs ❤️