Abstract
Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialects.
Models citing this paper 4
Datasets citing this paper 4
Spaces citing this paper 0
No Space linking this paper
Cite arxiv.org/abs/2005.03521 in a Space README.md to link it from this page.
Collections including this paper 0
No Collection including this paper
Add this paper to a
collection
to link it from this page.