norallm
/

normistral-11b-warm

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

davda54 commited on Oct 3, 2024

Commit

adff286

·

verified ·

1 Parent(s): 67dcdbf

Update README.md

Files changed (1) hide show

README.md +35 -0

README.md CHANGED Viewed

@@ -35,3 +35,38 @@ This model uses a new tokenizer, specially trained on the target languages. Ther
 |:------------|:--------:|:--------:|:---------:|:-------:|:--------:|:---------:|
 | Mistral-Nemo-Base-2407    | 131072 | 1.79   | 1.87    | 2.63  | 1.82   | 2.00    |
 | NorMistral-11b-warm | 51200 | 1.22   | 1.28    | 1.82  | 1.33   | 1.39    |

 |:------------|:--------:|:--------:|:---------:|:-------:|:--------:|:---------:|
 | Mistral-Nemo-Base-2407    | 131072 | 1.79   | 1.87    | 2.63  | 1.82   | 2.00    |
 | NorMistral-11b-warm | 51200 | 1.22   | 1.28    | 1.82  | 1.33   | 1.39    |
+## NorMistral-11b is also a bidirectional masked language model
+Having been pretrained on a mixed causal-masked objective, this model knows how to process texts bidirectionally. You can thus finetune this model like any other BERT-like model (or any other prefix language model). The model can also be used directly for masked language modeling:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# First, we will have to import the tokenizer and the language model
+# we can use CausalLM instead of MaskedLM just fine
+tokenizer = AutoTokenizer.from_pretrained(
+    "norallm/normistral-11b-warm"
+)
+model = AutoModelForCausalLM.from_pretrained(
+    "norallm/normistral-11b-warm"
+).cuda().eval()
+# A partially-masked input text string
+text = "En søt lundefugl flyr over de<mask>norske fjorder.""
+input_ids = tokenizer(text, return_tensors='pt').input_ids.cuda()
+# An empty attention mask allows uncontrained bidirectional attention
+attention_mask = torch.zeros(input_ids.size(0), 1, input_ids.size(1), input_ids.size(1), device=input_ids.device)
+output_logits = model(
+    input_ids=input_ids,
+    attention_mask=attention_mask,
+    return_dict=True
+).logits
+predictions = output_logits[0, :, :].argmax(dim=-1)
+# Expected output:
+# En søt lundefugl flyr over de<mask> norske fjorder. -> En søt lundefugl flyr over de vakre norske fjorder.
+print(f"{tokenizer.decode(input_ids[0, 1:])} -> {tokenizer.decode(predictions[:-1])}")
+```