Update README.md
Browse files
README.md
CHANGED
@@ -13,4 +13,32 @@ It is the [ITALIAN-LEGAL-BERT](https://huggingface.co/dlicari/Italian-Legal-BERT
|
|
13 |
It was trained from scratch using a larger training dataset, 6.6GB of civil and criminal cases.
|
14 |
We used [CamemBERT](https://huggingface.co/docs/transformers/main/en/model_doc/camembert) architecture with a language modeling head on top, AdamW Optimizer, initial learning rate 2e-5 (with linear learning rate decay), sequence length 512, batch size 18, 1 million training steps,
|
15 |
device 8*NVIDIA A100 40GB using distributed data parallel (each step performs 8 batches). It uses SentencePiece tokenization trained from scratch on a subset of training set (5 milions sentences)
|
16 |
-
and vocabulary size of 32000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
It was trained from scratch using a larger training dataset, 6.6GB of civil and criminal cases.
|
14 |
We used [CamemBERT](https://huggingface.co/docs/transformers/main/en/model_doc/camembert) architecture with a language modeling head on top, AdamW Optimizer, initial learning rate 2e-5 (with linear learning rate decay), sequence length 512, batch size 18, 1 million training steps,
|
15 |
device 8*NVIDIA A100 40GB using distributed data parallel (each step performs 8 batches). It uses SentencePiece tokenization trained from scratch on a subset of training set (5 milions sentences)
|
16 |
+
and vocabulary size of 32000
|
17 |
+
|
18 |
+
|
19 |
+
<h2> Usage </h2>
|
20 |
+
|
21 |
+
ITALIAN-LEGAL-BERT model can be loaded like:
|
22 |
+
|
23 |
+
```python
|
24 |
+
from transformers import AutoModel, AutoTokenizer
|
25 |
+
model_name = "dlicari/Italian-Legal-BERT-SC"
|
26 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
27 |
+
model = AutoModel.from_pretrained(model_name)
|
28 |
+
```
|
29 |
+
|
30 |
+
You can use the Transformers library fill-mask pipeline to do inference with ITALIAN-LEGAL-BERT.
|
31 |
+
```python
|
32 |
+
# %pip install sentencepiece
|
33 |
+
# %pip install transformers
|
34 |
+
|
35 |
+
from transformers import pipeline
|
36 |
+
model_name = "dlicari/Italian-Legal-BERT-SC"
|
37 |
+
fill_mask = pipeline("fill-mask", model_name)
|
38 |
+
fill_mask("Il <mask> ha chiesto revocarsi l'obbligo di pagamento")
|
39 |
+
# [{'score': 0.6529251933097839,'token_str': 'ricorrente',
|
40 |
+
# {'score': 0.0380014143884182, 'token_str': 'convenuto',
|
41 |
+
# {'score': 0.0360226035118103, 'token_str': 'richiedente',
|
42 |
+
# {'score': 0.023908283561468124,'token_str': 'Condominio',
|
43 |
+
# {'score': 0.020863816142082214, 'token_str': 'lavoratore'}]
|
44 |
+
```
|