Update README.md
Browse files
README.md
CHANGED
@@ -35,3 +35,38 @@ This model uses a new tokenizer, specially trained on the target languages. Ther
|
|
35 |
|:------------|:--------:|:--------:|:---------:|:-------:|:--------:|:---------:|
|
36 |
| Mistral-Nemo-Base-2407 | 131072 | 1.79 | 1.87 | 2.63 | 1.82 | 2.00 |
|
37 |
| NorMistral-11b-warm | 51200 | 1.22 | 1.28 | 1.82 | 1.33 | 1.39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|:------------|:--------:|:--------:|:---------:|:-------:|:--------:|:---------:|
|
36 |
| Mistral-Nemo-Base-2407 | 131072 | 1.79 | 1.87 | 2.63 | 1.82 | 2.00 |
|
37 |
| NorMistral-11b-warm | 51200 | 1.22 | 1.28 | 1.82 | 1.33 | 1.39 |
|
38 |
+
|
39 |
+
## NorMistral-11b is also a bidirectional masked language model
|
40 |
+
|
41 |
+
Having been pretrained on a mixed causal-masked objective, this model knows how to process texts bidirectionally. You can thus finetune this model like any other BERT-like model (or any other prefix language model). The model can also be used directly for masked language modeling:
|
42 |
+
|
43 |
+
```python
|
44 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
45 |
+
|
46 |
+
# First, we will have to import the tokenizer and the language model
|
47 |
+
# we can use CausalLM instead of MaskedLM just fine
|
48 |
+
tokenizer = AutoTokenizer.from_pretrained(
|
49 |
+
"norallm/normistral-11b-warm"
|
50 |
+
)
|
51 |
+
model = AutoModelForCausalLM.from_pretrained(
|
52 |
+
"norallm/normistral-11b-warm"
|
53 |
+
).cuda().eval()
|
54 |
+
|
55 |
+
# A partially-masked input text string
|
56 |
+
text = "En søt lundefugl flyr over de<mask>norske fjorder.""
|
57 |
+
input_ids = tokenizer(text, return_tensors='pt').input_ids.cuda()
|
58 |
+
|
59 |
+
# An empty attention mask allows uncontrained bidirectional attention
|
60 |
+
attention_mask = torch.zeros(input_ids.size(0), 1, input_ids.size(1), input_ids.size(1), device=input_ids.device)
|
61 |
+
|
62 |
+
output_logits = model(
|
63 |
+
input_ids=input_ids,
|
64 |
+
attention_mask=attention_mask,
|
65 |
+
return_dict=True
|
66 |
+
).logits
|
67 |
+
predictions = output_logits[0, :, :].argmax(dim=-1)
|
68 |
+
|
69 |
+
# Expected output:
|
70 |
+
# En søt lundefugl flyr over de<mask> norske fjorder. -> En søt lundefugl flyr over de vakre norske fjorder.
|
71 |
+
print(f"{tokenizer.decode(input_ids[0, 1:])} -> {tokenizer.decode(predictions[:-1])}")
|
72 |
+
```
|