gonzalez-agirre
commited on
Commit
·
081e10c
1
Parent(s):
19ba203
Update README.md
Browse files
README.md
CHANGED
@@ -53,7 +53,7 @@ widget:
|
|
53 |
|
54 |
## Model description
|
55 |
|
56 |
-
|
57 |
It is based on the [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) base model
|
58 |
and has been trained on a medium-size corpus collected from publicly available corpora and crawlers.
|
59 |
|
@@ -74,10 +74,11 @@ tokenizer_hf = AutoTokenizer.from_pretrained('projecte-aina/roberta-base-ca-v2')
|
|
74 |
model = AutoModelForMaskedLM.from_pretrained('projecte-aina/roberta-base-ca-v2')
|
75 |
model.eval()
|
76 |
pipeline = FillMaskPipeline(model, tokenizer_hf)
|
77 |
-
text = f"Em dic <mask>."
|
78 |
res_hf = pipeline(text)
|
79 |
pprint([r['token_str'] for r in res_hf])
|
80 |
```
|
|
|
81 |
## Training
|
82 |
|
83 |
### Training data
|
|
|
53 |
|
54 |
## Model description
|
55 |
|
56 |
+
The **roberta-base-ca-v2** is a transformer-based masked language model for the Catalan language.
|
57 |
It is based on the [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) base model
|
58 |
and has been trained on a medium-size corpus collected from publicly available corpora and crawlers.
|
59 |
|
|
|
74 |
model = AutoModelForMaskedLM.from_pretrained('projecte-aina/roberta-base-ca-v2')
|
75 |
model.eval()
|
76 |
pipeline = FillMaskPipeline(model, tokenizer_hf)
|
77 |
+
text = f"Em dic <mask>."
|
78 |
res_hf = pipeline(text)
|
79 |
pprint([r['token_str'] for r in res_hf])
|
80 |
```
|
81 |
+
|
82 |
## Training
|
83 |
|
84 |
### Training data
|