a-mannion commited on
Commit
6ce9c62
·
verified ·
1 Parent(s): a6bb573

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -5
README.md CHANGED
@@ -5,7 +5,6 @@ language:
5
  library_name: transformers
6
  tags:
7
  - linformer
8
- - legal
9
  - medical
10
  - RoBERTa
11
  - pytorch
@@ -29,7 +28,7 @@ Jargon is available in several versions with different context sizes and types o
29
  | jargon-general-legal | jargon-general-base |18GB Legal Corpus
30
  | [jargon-multidomain-base](https://huggingface.co/PantagrueLLM/jargon-multidomain-base) | jargon-general-base |Medical+Legal Corpora|
31
  | jargon-legal | scratch |18GB Legal Corpus|
32
- | jargon-legal-4096 | scratch |18GB Legal Corpus|
33
  | [jargon-biomed](https://huggingface.co/PantagrueLLM/jargon-biomed) | scratch |5.4GB Medical Corpus|
34
  | [jargon-biomed-4096](https://huggingface.co/PantagrueLLM/jargon-biomed-4096) | scratch |5.4GB Medical Corpus|
35
  | [jargon-NACHOS](https://huggingface.co/PantagrueLLM/jargon-NACHOS) | scratch |[NACHOS](https://drbert.univ-avignon.fr/)|
@@ -40,18 +39,34 @@ Jargon is available in several versions with different context sizes and types o
40
 
41
  The Jargon models were evaluated on an range of specialized downstream tasks.
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  For more info please check out the [paper](https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf), accepted for publication at [LREC-COLING 2024](https://lrec-coling-2024.org/list-of-accepted-papers/).
44
 
45
 
46
  ## Using Jargon models with HuggingFace transformers
47
 
48
- You can get started with `jargon-general-biomed` using the code snippet below:
49
 
50
  ```python
51
  from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
52
 
53
- tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-general-biomed", trust_remote_code=True)
54
- model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-general-biomed", trust_remote_code=True)
55
 
56
  jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer)
57
  output = jargon_maskfiller("Il est allé au <mask> hier")
 
5
  library_name: transformers
6
  tags:
7
  - linformer
 
8
  - medical
9
  - RoBERTa
10
  - pytorch
 
28
  | jargon-general-legal | jargon-general-base |18GB Legal Corpus
29
  | [jargon-multidomain-base](https://huggingface.co/PantagrueLLM/jargon-multidomain-base) | jargon-general-base |Medical+Legal Corpora|
30
  | jargon-legal | scratch |18GB Legal Corpus|
31
+ | [jargon-legal-4096](https://huggingface.co/PantagrueLLM/jargon-legal-4096) | scratch |18GB Legal Corpus|
32
  | [jargon-biomed](https://huggingface.co/PantagrueLLM/jargon-biomed) | scratch |5.4GB Medical Corpus|
33
  | [jargon-biomed-4096](https://huggingface.co/PantagrueLLM/jargon-biomed-4096) | scratch |5.4GB Medical Corpus|
34
  | [jargon-NACHOS](https://huggingface.co/PantagrueLLM/jargon-NACHOS) | scratch |[NACHOS](https://drbert.univ-avignon.fr/)|
 
39
 
40
  The Jargon models were evaluated on an range of specialized downstream tasks.
41
 
42
+ ## Biomedical Benchmark
43
+
44
+ Results averaged across five funs with varying random seeds.
45
+
46
+ | |[**FrenchMedMCQA**](https://huggingface.co/datasets/qanastek/frenchmedmcqa)|[**MQC**](https://aclanthology.org/2020.lrec-1.72/)|[**CAS-POS**](https://clementdalloux.fr/?page_id=28)|[**ESSAI-POS**](https://clementdalloux.fr/?page_id=28)|[**CAS-SG**](https://aclanthology.org/W18-5614/)|[**MEDLINE**](https://huggingface.co/datasets/mnaguib/QuaeroFrenchMed)|[**EMEA**](https://huggingface.co/datasets/mnaguib/QuaeroFrenchMed)|[**E3C-NER**](https://live.european-language-grid.eu/catalogue/corpus/7618)|[**CLISTER**](https://aclanthology.org/2022.lrec-1.459/)|
47
+ |-------------------------|:-----------------------:|:-----------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|
48
+ | **Task Type** | Sequence Classification | Sequence Classification | Token Classification | Token Classification | Token Classification | Token Classification | Token Classification | Token Classification | STS |
49
+ | **Metric** | EMR | Accuracy | Macro-F1 | Macro-F1 | Weighted F1 | Weighted F1 | Weighted F1 | Weighted F1 | Spearman Correlation |
50
+ | jargon-general-base | 12.9 | 76.7 | 96.6 | 96.0 | 69.4 | 81.7 | 96.5 | 91.9 | 78.0 |
51
+ | jargon-biomed | 15.3 | 91.1 | 96.5 | 95.6 | 75.1 | 83.7 | 96.5 | 93.5 | 74.6 |
52
+ | jargon-biomed-4096 | 14.4 | 78.9 | 96.6 | 95.9 | 73.3 | 82.3 | 96.3 | 92.5 | 65.3 |
53
+ | jargon-general-biomed | 16.1 | 69.7 | 95.1 | 95.1 | 67.8 | 78.2 | 96.6 | 91.3 | 59.7 |
54
+ | jargon-multidomain-base | 14.9 | 86.9 | 96.3 | 96.0 | 70.6 | 82.4 | 96.6 | 92.6 | 74.8 |
55
+ | jargon-NACHOS | 13.3 | 90.7 | 96.3 | 96.2 | 75.0 | 83.4 | 96.8 | 93.1 | 70.9 |
56
+ | jargon-NACHOS-4096 | 18.4 | 93.2 | 96.2 | 95.9 | 74.9 | 83.8 | 96.8 | 93.2 | 74.9 |
57
+
58
  For more info please check out the [paper](https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf), accepted for publication at [LREC-COLING 2024](https://lrec-coling-2024.org/list-of-accepted-papers/).
59
 
60
 
61
  ## Using Jargon models with HuggingFace transformers
62
 
63
+ You can get started with `jargon-NACHOS-4096` using the code snippet below:
64
 
65
  ```python
66
  from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
67
 
68
+ tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-NACHOS-4096", trust_remote_code=True)
69
+ model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-NACHOS-4096", trust_remote_code=True)
70
 
71
  jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer)
72
  output = jargon_maskfiller("Il est allé au <mask> hier")