File size: 1,826 Bytes
460875b
 
76e45d2
 
 
460875b
 
76e45d2
460875b
76e45d2
460875b
76e45d2
460875b
76e45d2
460875b
 
 
76e45d2
460875b
76e45d2
460875b
76e45d2
 
460875b
 
 
 
 
 
 
 
 
 
 
 
 
 
76e45d2
 
460875b
76e45d2
460875b
76e45d2
 
 
 
460875b
76e45d2
 
460875b
9e557cb
76e45d2
 
460875b
76e45d2
460875b
76e45d2
460875b
76e45d2
 
 
 
 
 
 
460875b
76e45d2
 
 
460875b
 
 
76e45d2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
library_name: transformers
language:
- my
- en
---

# Burmese-Bert

Burmese-Bert is a Bilingual Mask Language Model based on "bert-large-uncased".

The architecture is based on bidirectional encoder representations from transformers.

Supports English and Burmese language.

## Model Details

Coming Soon

### Model Description

- **Developed by:** Min Si Thu
- **Model type:** bidirectional encoder representations from transformers
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]

## Uses

- Mask Filling Language Model
- Burmese Natural Language Understanding

### How to use

```shell
# install the dependencies
pip install transformers
```

```python
from transformers import AutoModelForMaskedLM,AutoTokenizer

model_checkpoint = "jojo-ai-mst/BurmeseBert"
model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

text = "This is a great [MASK]."

import torch

inputs = tokenizer(text, return_tensors="pt")
token_logits = model(**inputs).logits
# Find the location of [MASK] and extract its logits
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
mask_token_logits = token_logits[0, mask_token_index, :]
# Pick the [MASK] candidates with the highest logits
top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()

for token in top_5_tokens:
    print(f"'>>> {text.replace(tokenizer.mask_token, tokenizer.decode([token]))}'")
```

## Citation [optional]

Coming Soon