Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/gsarti/scibert-nli/README.md
README.md
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# SciBERT-NLI
|
2 |
+
|
3 |
+
This is the model [SciBERT](https://github.com/allenai/scibert) [1] fine-tuned on the [SNLI](https://nlp.stanford.edu/projects/snli/) and the [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) datasets using the [`sentence-transformers` library](https://github.com/UKPLab/sentence-transformers/) to produce universal sentence embeddings [2].
|
4 |
+
|
5 |
+
The model uses the original `scivocab` wordpiece vocabulary and was trained using the **average pooling strategy** and a **softmax loss**.
|
6 |
+
|
7 |
+
**Base model**: `allenai/scibert-scivocab-cased` from HuggingFace's `AutoModel`.
|
8 |
+
|
9 |
+
**Training time**: ~4 hours on the NVIDIA Tesla P100 GPU provided in Kaggle Notebooks.
|
10 |
+
|
11 |
+
**Parameters**:
|
12 |
+
|
13 |
+
| Parameter | Value |
|
14 |
+
|------------------|-------|
|
15 |
+
| Batch size | 64 |
|
16 |
+
| Training steps | 20000 |
|
17 |
+
| Warmup steps | 1450 |
|
18 |
+
| Lowercasing | True |
|
19 |
+
| Max. Seq. Length | 128 |
|
20 |
+
|
21 |
+
**Performances**: The performance was evaluated on the test portion of the [STS dataset](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) using Spearman rank correlation and compared to the performances of a general BERT base model obtained with the same procedure to verify their similarity.
|
22 |
+
|
23 |
+
| Model | Score |
|
24 |
+
|-------------------------------|-------------|
|
25 |
+
| `scibert-nli` (this) | 74.50 |
|
26 |
+
| `bert-base-nli-mean-tokens`[3]| 77.12 |
|
27 |
+
|
28 |
+
An example usage for similarity-based scientific paper retrieval is provided in the [Covid Papers Browser](https://github.com/gsarti/covid-papers-browser) repository.
|
29 |
+
|
30 |
+
**References:**
|
31 |
+
|
32 |
+
[1] I. Beltagy et al, [SciBERT: A Pretrained Language Model for Scientific Text](https://www.aclweb.org/anthology/D19-1371/)
|
33 |
+
|
34 |
+
[2] A. Conneau et al., [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://www.aclweb.org/anthology/D17-1070/)
|
35 |
+
|
36 |
+
[3] N. Reimers et I. Gurevych, [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://www.aclweb.org/anthology/D19-1410/)
|