dlicari
/

Italian-Legal-BERT

Inference Endpoints

Model card Files Files and versions Community

Italian-Legal-BERT / README.md

Daniele Licari

Update README.md

8f704e3 over 2 years ago

|

1.15 kB

	---
	language: it
	license: apache-2.0
	widget:
	- text: "Il [MASK] ha chiesto revocarsi l'obbligo di pagamento"
	---

	<img src="https://huggingface.co/dlicari/Italian-Legal-BERT/resolve/main/ITALIAN_LEGAL_BERT.jpg" width="500"/>
	<h1> ITALIAN-LEGAL-BERT:A pre-trained Transformer Language Model for Italian Law </h1>

	ITALIAN-LEGAL-BERT is based on <a href="https://huggingface.co/dbmdz/bert-base-italian-xxl-cased">bert-base-italian-xxl-cased</a> with additional pre-training of the Italian BERT model on Italian civil law corpora.
	It achieves better results than the ‘general-purpose’ Italian BERT in different domain-specific tasks.

	<h2>Training procedure</h2>
	We initialized ITALIAN-LEGAL-BERT with ITALIAN XXL BERT
	and pretrained for an additional 4 epochs on 3.7 GB of text from the National Jurisprudential
	Archive using the Huggingface PyTorch-Transformers library. We used BERT architecture
	with a language modeling head on top, AdamW Optimizer, initial learning rate 5e-5 (with
	linear learning rate decay, ends at 2.525e-9), sequence length 512, batch size 10 (imposed
	by GPU capacity), 8.4 million training steps, device 1*GPU V100 16GB