Added note on normalizer.
Browse files
README.md
CHANGED
@@ -9,7 +9,9 @@ licenses:
|
|
9 |
|
10 |
This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
|
11 |
|
12 |
-
For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official [repository](https://github.com/csebuetnlp/banglabert).
|
|
|
|
|
13 |
|
14 |
## Using this model as a discriminator in `transformers` (tested on 4.11.0.dev0)
|
15 |
|
|
|
9 |
|
10 |
This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
|
11 |
|
12 |
+
For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official GitHub [repository](https://github.com/csebuetnlp/banglabert).
|
13 |
+
|
14 |
+
**Note**: This model was pretrained using a specific normalization pipeline available [here](https://github.com/csebuetnlp/normalizer). All finetuning scripts in the official GitHub repository uses this normalization by default. If you need to adapt the pretrained model for a different task make sure the text units are normalized using this pipeline before tokenizing to get best results. A basic example is given below:
|
15 |
|
16 |
## Using this model as a discriminator in `transformers` (tested on 4.11.0.dev0)
|
17 |
|