Kowsher commited on
Commit
b02525d
·
1 Parent(s): b13800c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -7,7 +7,6 @@ tags:
7
  - Bangla Base Bert
8
  - Bangla Bert language model
9
  - Bangla Bert
10
- license: MIT
11
  datasets:
12
  - BanglaLM dataset
13
  ---
@@ -16,10 +15,11 @@ Here we published a pretrained Bangla bert language model as **bert-base-bangla*
16
  Here we described [bert-base-bangla](https://github.com/Kowsher/bert-base-bangla) which is a pretrained Bangla language model based on mask language modeling described in [BERT](https://arxiv.org/abs/1810.04805) and the GitHub [repository](https://github.com/google-research/bert)
17
  ## Corpus Details
18
  We trained the Bangla bert language model using BanglaLM dataset from kaggle [BanglaLM](https://www.kaggle.com/gakowsher/bangla-language-model-dataset). There is 3 version of dataset which is almost 40GB.
19
- After downloading the dataset, we went on the way of mask LM, described here [BERT](https://arxiv.org/abs/1810.04805)
20
- ```
21
 
22
  **Bangla Base BERT Tokenizer**
 
23
  ```py
24
  from transformers import AutoTokenizer, AutoModel
25
  bnbert_tokenizer = AutoTokenizer.from_pretrained("Kowsher/bert-base-test")
 
7
  - Bangla Base Bert
8
  - Bangla Bert language model
9
  - Bangla Bert
 
10
  datasets:
11
  - BanglaLM dataset
12
  ---
 
15
  Here we described [bert-base-bangla](https://github.com/Kowsher/bert-base-bangla) which is a pretrained Bangla language model based on mask language modeling described in [BERT](https://arxiv.org/abs/1810.04805) and the GitHub [repository](https://github.com/google-research/bert)
16
  ## Corpus Details
17
  We trained the Bangla bert language model using BanglaLM dataset from kaggle [BanglaLM](https://www.kaggle.com/gakowsher/bangla-language-model-dataset). There is 3 version of dataset which is almost 40GB.
18
+ After downloading the dataset, we went on the way to mask LM.
19
+
20
 
21
  **Bangla Base BERT Tokenizer**
22
+
23
  ```py
24
  from transformers import AutoTokenizer, AutoModel
25
  bnbert_tokenizer = AutoTokenizer.from_pretrained("Kowsher/bert-base-test")