csebuetnlp
/

banglabert

Model card Files Files and versions

Tahmid commited on Sep 15, 2021

Commit

31a5652

·

1 Parent(s): a91f0f3

Added note on normalizer.

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -9,7 +9,9 @@ licenses:
 This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
-For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official [repository](https://github.com/csebuetnlp/banglabert).
 ## Using this model as a discriminator in `transformers` (tested on 4.11.0.dev0)

 This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
+For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official GitHub [repository](https://github.com/csebuetnlp/banglabert).
+**Note**: This model was pretrained using a specific normalization pipeline available [here](https://github.com/csebuetnlp/normalizer). All finetuning scripts in the official GitHub repository uses this normalization by default. If you need to adapt the pretrained model for a different task make sure the text units are normalized using this pipeline before tokenizing to get best results. A basic example is given below:
 ## Using this model as a discriminator in `transformers` (tested on 4.11.0.dev0)