README / README.md
UMCU's picture
Update README.md
1a8bdb1 verified
---
title: README
emoji: 📊
colorFrom: green
colorTo: red
sdk: static
pinned: false
license: afl-3.0
---
Useful HF resources and fantastic contributors for Dutch NLP are
## Individuals
* [Pieter Delobelle](https://huggingface.co/pdelobelle), [homepage](https://pieter.ai/) and [git](https://github.com/ipieter)
* [Bram van Roy](https://huggingface.co/BramVanroy) and [homepage](https://bramvanroy.github.io/)
* [Robin Smits](https://huggingface.co/robinsmits) and [git](https://github.com/robinsmits)
* [Janneke van de Zwaan](https://huggingface.co/jvdzwaan/ocrpostcorrection-task-1) and [git](https://github.com/jvdzwaan)
* [Yeb Havinga](https://huggingface.co/yhavinga) and [git](https://github.com/yhavinga)
* [Wietse de Vries](https://huggingface.co/wietsedv) and [git](https://github.com/wietsedv)
* [François Remy](https://huggingface.co/FremyCompany), [homepage](http://fremycompany.com) and [git](https://github.com/FremyCompany)
* [Maarten Grootendorst](https://huggingface.co/MaartenGr), [homepage](https://www.maartengrootendorst.com/) and [git](https://github.com/MaartenGr)
* [Piek Vossen](https://vossen.info/) and [git](https://github.com/piekvossen)
* [Eva Rombouts](https://huggingface.co/ekrombouts) and [git](https://github.com/ekrombouts)
* [Joeran Bosma](https://huggingface.co/joeranbosma/) and [git](https://github.com/joeranbosma)
## Organisations
* [University Medical Center Utrecht](https://github.com/umcu)
* [NLPtown](https://huggingface.co/nlptown) and [homepage](http://nlp.town/)
* [doc2query](https://huggingface.co/doc2query)
* [LT3, language and translation technology team, University of Gent](https://huggingface.co/LT3) and [homepage](https://lt3.ugent.be/)
* [Textgain](https://huggingface.co/textgain) and [homepage](https://www.textgain.com/)
* [ML6](https://huggingface.co/ml6team), [homepage](https://ml6.eu/) and [git](https://github.com/ml6team)
* [CLiPS](https://huggingface.co/clips), [homepage](https://www.uantwerpen.be/en/research-groups/clips/) and [git](https://github.com/clips)
* [DTAI Research Group, KU Leuven](https://huggingface.co/DTAI-KULeuven), [homepage](https://dtai.cs.kuleuven.be/) and [git](https://github.com/ML-KULeuven)
* [GroNLP](https://huggingface.co/GroNLP), [homepage](https://www.rug.nl/research/clcg/research/cl/)
* [CLTL](https://huggingface.co/CLTL), [homepage](http://cltl.nl) and [git](https://github.com/CLTL)
* [Nederlands Forensic Institute](https://huggingface.co/NetherlandsForensicInstitute), [homepage](https://forensicinstitute.nl/) and [git](https://github.com/NetherlandsForensicInstitute)
* [Integraal Kanker centrum Nederland (iKNL)](https://github.com/iknl)
* [Erasmus Medical Informatics](https://github.com/mi-erasmusmc)
## NLP Libraries relevant for (Dutch) clinical NLP:
* [Clinlp](https://github.com/umcu/clinlp)
## Encoder models
* [*RobBERT 2023*](https://huggingface.co/DTAI-KULeuven/robbert-2023-dutch-base)
* [*BERTje*](https://huggingface.co/GroNLP/bert-base-dutch-cased)
* [*BelabBERT*](https://huggingface.co/jwouts/belabBERT_115k)
* [**MedRoBERTa.nl**](https://huggingface.co/CLTL/MedRoBERTa.nl)
* [**CardioBERTa.nl**](https://huggingface.co/UMCU/CardioBERTa.nl_clinical)
* [**CardioDeBERTa.nl**](https://huggingface.co/UMCU/CardioDeBERTa.nl)
* [**DRAGON-longformer-large-domain-specific**](https://huggingface.co/joeranbosma/dragon-longformer-large-domain-specific)
* [**DRAGON-longformer-base-domain-specific**](https://huggingface.co/joeranbosma/dragon-longformer-base-domain-specific)
* [**DRAGON-roberta-large-domain-specific**](https://huggingface.co/joeranbosma/dragon-roberta-large-domain-specific)
* [**DRAGON-roberta-base-domain-specific**](https://huggingface.co/joeranbosma/dragon-roberta-base-domain-specific)
* [**DRAGON-bert-base-domain-specific**](https://huggingface.co/joeranbosma/dragon-bert-base-domain-specific)
## Contrastive encoder models
* [BioLord 2023-M Dutch](https://huggingface.co/FremyCompany/BioLORD-2023-M-Dutch-InContext-v1)
## Decoder models
* [*GPT-2 on mC4*](https://huggingface.co/yhavinga/gpt2-large-dutch), [GPT-2 finetuned on Dutch](https://huggingface.co/GroNLP/gpt2-medium-dutch-embeddings)
* [*GPT-neo on mC4*](https://huggingface.co/yhavinga/gpt-neo-1.3B-dutch)
* [*GEITje (based on Mistral)*](https://github.com/Rijgersberg/GEITje)
* [*Fietje (based on Phi-2)*](https://huggingface.co/BramVanroy/fietje-2), [**Zust_fietje**](https://huggingface.co/ekrombouts/zuster_fietje)
* [**J1**](https://huggingface.co/Juvoly/J1-Llama-8B-exp)
## NTMs
* [NLLB200](https://huggingface.co/facebook/nllb-200-3.3B)
* [UL2, en-nl](https://huggingface.co/yhavinga/ul2-large-en-nl), [UL2, nl-en](https://huggingface.co/yhavinga/ul2-large-dutch-english)
* [OPUS MT, en-nl](https://huggingface.co/Helsinki-NLP/opus-mt-en-nl), [OPUS MT, nl-en](https://huggingface.co/Helsinki-NLP/opus-mt-nl-en), [OPUS MT Healthcare, nl-en](https://huggingface.co/FremyCompany/opus-mt-nl-en-healthcare)
* [Llama 2 MT, nl-en](https://huggingface.co/kaitchup/Llama-2-7b-mt-Dutch-to-English)
## Datasets
* [SoNaR](https://taalmaterialen.ivdnt.org/download/tstc-sonar-corpus/)
* [COW](https://rolandschaefer.net/archives/142)
* [mc4 cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned)
* [TWnC](https://research.utwente.nl/en/publications/twnc-a-multifaceted-dutch-news-corpus)
* [Gigacorpus](http://gigacorpus.nl/)
* [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX)
* [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)
* [FineWeb 2](https://github.com/huggingface/fineweb-2)