davda54 commited on
Commit
adff286
·
verified ·
1 Parent(s): 67dcdbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md CHANGED
@@ -35,3 +35,38 @@ This model uses a new tokenizer, specially trained on the target languages. Ther
35
  |:------------|:--------:|:--------:|:---------:|:-------:|:--------:|:---------:|
36
  | Mistral-Nemo-Base-2407 | 131072 | 1.79 | 1.87 | 2.63 | 1.82 | 2.00 |
37
  | NorMistral-11b-warm | 51200 | 1.22 | 1.28 | 1.82 | 1.33 | 1.39 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  |:------------|:--------:|:--------:|:---------:|:-------:|:--------:|:---------:|
36
  | Mistral-Nemo-Base-2407 | 131072 | 1.79 | 1.87 | 2.63 | 1.82 | 2.00 |
37
  | NorMistral-11b-warm | 51200 | 1.22 | 1.28 | 1.82 | 1.33 | 1.39 |
38
+
39
+ ## NorMistral-11b is also a bidirectional masked language model
40
+
41
+ Having been pretrained on a mixed causal-masked objective, this model knows how to process texts bidirectionally. You can thus finetune this model like any other BERT-like model (or any other prefix language model). The model can also be used directly for masked language modeling:
42
+
43
+ ```python
44
+ from transformers import AutoTokenizer, AutoModelForCausalLM
45
+
46
+ # First, we will have to import the tokenizer and the language model
47
+ # we can use CausalLM instead of MaskedLM just fine
48
+ tokenizer = AutoTokenizer.from_pretrained(
49
+ "norallm/normistral-11b-warm"
50
+ )
51
+ model = AutoModelForCausalLM.from_pretrained(
52
+ "norallm/normistral-11b-warm"
53
+ ).cuda().eval()
54
+
55
+ # A partially-masked input text string
56
+ text = "En søt lundefugl flyr over de<mask>norske fjorder.""
57
+ input_ids = tokenizer(text, return_tensors='pt').input_ids.cuda()
58
+
59
+ # An empty attention mask allows uncontrained bidirectional attention
60
+ attention_mask = torch.zeros(input_ids.size(0), 1, input_ids.size(1), input_ids.size(1), device=input_ids.device)
61
+
62
+ output_logits = model(
63
+ input_ids=input_ids,
64
+ attention_mask=attention_mask,
65
+ return_dict=True
66
+ ).logits
67
+ predictions = output_logits[0, :, :].argmax(dim=-1)
68
+
69
+ # Expected output:
70
+ # En søt lundefugl flyr over de<mask> norske fjorder. -> En søt lundefugl flyr over de vakre norske fjorder.
71
+ print(f"{tokenizer.decode(input_ids[0, 1:])} -> {tokenizer.decode(predictions[:-1])}")
72
+ ```