pile-of-law
/

legalbert-large-1.7M-1

@@ -4,11 +4,11 @@ language:
 pipeline_tag: fill-mask
 ---
-# LegalBERT large model (uncased)
 Pretrained model on English language legal and administrative text using the [RoBERTa](https://arxiv.org/abs/1907.11692) pretraining objective.
 ## Model description
-LegalBERT large is a transformers model with the [BERT large model (uncased)](https://huggingface.co/bert-large-uncased) architecture pretrained on the Pile of Law, a dataset consisting of ~256GB of English language legal and administrative text for language model pretraining.
 ## Intended uses & limitations
 You can use the raw model for masked language modeling or fine-tune it for a downstream task. Since this model was pretrained on a English language legal and administrative text corpus, legal downstream tasks will likely be more in-domain for this model.
@@ -18,7 +18,7 @@ You can use the raw model for masked language modeling or fine-tune it for a dow
 ## Limitations and bias
 ## Training data
-The LegalBERT model was pretrained on the Pile of Law, a dataset consisting of ~256GB of English language legal and administrative text for language model pretraining. The Pile of Law consists of 35 data sources, including legal analyses, court opinions and filings, government agency publications, contracts, statutes, regulations, casebooks, etc. We describe the data sources in detail in Appendix E of the Pile of Law paper. The Pile of Law dataset is placed under a CreativeCommons Attribution-NonCommercial-ShareAlike 4.0 International license.
 ## Training procedure
 ### Preprocessing
@@ -30,7 +30,7 @@ The model was trained on a SambaNova cluster, with 8 RDUs, for 1.7 million steps
 We trained two models with the same configuration in parallel model training runs, with different random seeds. We selected the lowest log likelihood model, [legalbert-large-1.7M-1](https://huggingface.co/pile-of-law/legalbert-large-1.7M-1), which we refer to as PoL-BERT-Large, for experiments, but also release the second model, [legalbert-large-1.7M-2](https://huggingface.co/pile-of-law/legalbert-large-1.7M-2).
 ## Evaluation results
-When finetuned on the CaseHOLD variant provided by the [LexGLUE paper](https://arxiv.org/abs/2110.00976), this model, PoL-BERT-Large, achieves the following results. In the table below, we also report results for [BERT-Large-Uncased(]https://huggingface.co/bert-large-uncased) and [CaseLaw-BERT](https://huggingface.co/zlucia/custom-legalbert). We report results on the models with hyperparameter tuning on the downstream task and the result reported for the CaseLaw-BERT model from the [LexGLUE paper](https://arxiv.org/abs/2110.00976), which uses a fixed experimental setup.
 CaseHOLD test results:

 pipeline_tag: fill-mask
 ---
+# Pile of Law BERT large model (uncased)
 Pretrained model on English language legal and administrative text using the [RoBERTa](https://arxiv.org/abs/1907.11692) pretraining objective.
 ## Model description
+Pile of Law BERT large is a transformers model with the [BERT large model (uncased)](https://huggingface.co/bert-large-uncased) architecture pretrained on the Pile of Law, a dataset consisting of ~256GB of English language legal and administrative text for language model pretraining.
 ## Intended uses & limitations
 You can use the raw model for masked language modeling or fine-tune it for a downstream task. Since this model was pretrained on a English language legal and administrative text corpus, legal downstream tasks will likely be more in-domain for this model.
 ## Limitations and bias
 ## Training data
+The Pile of Law BERT large model was pretrained on the Pile of Law, a dataset consisting of ~256GB of English language legal and administrative text for language model pretraining. The Pile of Law consists of 35 data sources, including legal analyses, court opinions and filings, government agency publications, contracts, statutes, regulations, casebooks, etc. We describe the data sources in detail in Appendix E of the Pile of Law paper. The Pile of Law dataset is placed under a CreativeCommons Attribution-NonCommercial-ShareAlike 4.0 International license.
 ## Training procedure
 ### Preprocessing
 We trained two models with the same configuration in parallel model training runs, with different random seeds. We selected the lowest log likelihood model, [legalbert-large-1.7M-1](https://huggingface.co/pile-of-law/legalbert-large-1.7M-1), which we refer to as PoL-BERT-Large, for experiments, but also release the second model, [legalbert-large-1.7M-2](https://huggingface.co/pile-of-law/legalbert-large-1.7M-2).
 ## Evaluation results
+When finetuned on the CaseHOLD variant provided by the [LexGLUE paper](https://arxiv.org/abs/2110.00976), this model, PoL-BERT-Large, achieves the following results. In the table below, we also report results for [BERT-Large-Uncased](https://huggingface.co/bert-large-uncased) and [CaseLaw-BERT](https://huggingface.co/zlucia/custom-legalbert). We report results on the models with hyperparameter tuning on the downstream task and the result reported for the CaseLaw-BERT model from the [LexGLUE paper](https://arxiv.org/abs/2110.00976), which uses a fixed experimental setup.
 CaseHOLD test results: