zlucia commited on
Commit
33eb38c
1 Parent(s): d5828d3

Edits to titles, fix typos

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -4,11 +4,11 @@ language:
4
  pipeline_tag: fill-mask
5
  ---
6
 
7
- # LegalBERT large model (uncased)
8
  Pretrained model on English language legal and administrative text using the [RoBERTa](https://arxiv.org/abs/1907.11692) pretraining objective.
9
 
10
  ## Model description
11
- LegalBERT large is a transformers model with the [BERT large model (uncased)](https://huggingface.co/bert-large-uncased) architecture pretrained on the Pile of Law, a dataset consisting of ~256GB of English language legal and administrative text for language model pretraining.
12
 
13
  ## Intended uses & limitations
14
  You can use the raw model for masked language modeling or fine-tune it for a downstream task. Since this model was pretrained on a English language legal and administrative text corpus, legal downstream tasks will likely be more in-domain for this model.
@@ -18,7 +18,7 @@ You can use the raw model for masked language modeling or fine-tune it for a dow
18
  ## Limitations and bias
19
 
20
  ## Training data
21
- The LegalBERT model was pretrained on the Pile of Law, a dataset consisting of ~256GB of English language legal and administrative text for language model pretraining. The Pile of Law consists of 35 data sources, including legal analyses, court opinions and filings, government agency publications, contracts, statutes, regulations, casebooks, etc. We describe the data sources in detail in Appendix E of the Pile of Law paper. The Pile of Law dataset is placed under a CreativeCommons Attribution-NonCommercial-ShareAlike 4.0 International license.
22
 
23
  ## Training procedure
24
  ### Preprocessing
@@ -30,7 +30,7 @@ The model was trained on a SambaNova cluster, with 8 RDUs, for 1.7 million steps
30
  We trained two models with the same configuration in parallel model training runs, with different random seeds. We selected the lowest log likelihood model, [legalbert-large-1.7M-1](https://huggingface.co/pile-of-law/legalbert-large-1.7M-1), which we refer to as PoL-BERT-Large, for experiments, but also release the second model, [legalbert-large-1.7M-2](https://huggingface.co/pile-of-law/legalbert-large-1.7M-2).
31
 
32
  ## Evaluation results
33
- When finetuned on the CaseHOLD variant provided by the [LexGLUE paper](https://arxiv.org/abs/2110.00976), this model, PoL-BERT-Large, achieves the following results. In the table below, we also report results for [BERT-Large-Uncased(]https://huggingface.co/bert-large-uncased) and [CaseLaw-BERT](https://huggingface.co/zlucia/custom-legalbert). We report results on the models with hyperparameter tuning on the downstream task and the result reported for the CaseLaw-BERT model from the [LexGLUE paper](https://arxiv.org/abs/2110.00976), which uses a fixed experimental setup.
34
 
35
  CaseHOLD test results:
36
 
 
4
  pipeline_tag: fill-mask
5
  ---
6
 
7
+ # Pile of Law BERT large model (uncased)
8
  Pretrained model on English language legal and administrative text using the [RoBERTa](https://arxiv.org/abs/1907.11692) pretraining objective.
9
 
10
  ## Model description
11
+ Pile of Law BERT large is a transformers model with the [BERT large model (uncased)](https://huggingface.co/bert-large-uncased) architecture pretrained on the Pile of Law, a dataset consisting of ~256GB of English language legal and administrative text for language model pretraining.
12
 
13
  ## Intended uses & limitations
14
  You can use the raw model for masked language modeling or fine-tune it for a downstream task. Since this model was pretrained on a English language legal and administrative text corpus, legal downstream tasks will likely be more in-domain for this model.
 
18
  ## Limitations and bias
19
 
20
  ## Training data
21
+ The Pile of Law BERT large model was pretrained on the Pile of Law, a dataset consisting of ~256GB of English language legal and administrative text for language model pretraining. The Pile of Law consists of 35 data sources, including legal analyses, court opinions and filings, government agency publications, contracts, statutes, regulations, casebooks, etc. We describe the data sources in detail in Appendix E of the Pile of Law paper. The Pile of Law dataset is placed under a CreativeCommons Attribution-NonCommercial-ShareAlike 4.0 International license.
22
 
23
  ## Training procedure
24
  ### Preprocessing
 
30
  We trained two models with the same configuration in parallel model training runs, with different random seeds. We selected the lowest log likelihood model, [legalbert-large-1.7M-1](https://huggingface.co/pile-of-law/legalbert-large-1.7M-1), which we refer to as PoL-BERT-Large, for experiments, but also release the second model, [legalbert-large-1.7M-2](https://huggingface.co/pile-of-law/legalbert-large-1.7M-2).
31
 
32
  ## Evaluation results
33
+ When finetuned on the CaseHOLD variant provided by the [LexGLUE paper](https://arxiv.org/abs/2110.00976), this model, PoL-BERT-Large, achieves the following results. In the table below, we also report results for [BERT-Large-Uncased](https://huggingface.co/bert-large-uncased) and [CaseLaw-BERT](https://huggingface.co/zlucia/custom-legalbert). We report results on the models with hyperparameter tuning on the downstream task and the result reported for the CaseLaw-BERT model from the [LexGLUE paper](https://arxiv.org/abs/2110.00976), which uses a fixed experimental setup.
34
 
35
  CaseHOLD test results:
36