Commit
·
63bbc62
1
Parent(s):
e3f3825
Update README.md
Browse filesreadme adapted to specific model
README.md
CHANGED
@@ -12,8 +12,6 @@ license: mit
|
|
12 |
# deberta-v3-large-zeroshot-v1.1-all-33
|
13 |
## Model description
|
14 |
The model is designed for zero-shot classification with the Hugging Face pipeline.
|
15 |
-
The model should be substantially better at zero-shot classification than my other zero-shot models on the
|
16 |
-
Hugging Face hub: https://huggingface.co/MoritzLaurer.
|
17 |
|
18 |
The model can do one universal task: determine whether a hypothesis is `true` or `not_true`
|
19 |
given a text (also called `entailment` vs. `not_entailment`).
|
@@ -21,17 +19,18 @@ This task format is based on the Natural Language Inference task (NLI).
|
|
21 |
The task is so universal that any classification task can be reformulated into the task.
|
22 |
|
23 |
## Training data
|
24 |
-
The model was trained on a mixture of
|
25 |
-
1.
|
|
|
26 |
'amazonpolarity', 'imdb', 'appreviews', 'yelpreviews', 'rottentomatoes',
|
27 |
'emotiondair', 'emocontext', 'empathetic',
|
28 |
'financialphrasebank', 'banking77', 'massive',
|
29 |
'wikitoxic_toxicaggregated', 'wikitoxic_obscene', 'wikitoxic_threat', 'wikitoxic_insult', 'wikitoxic_identityhate',
|
30 |
'hateoffensive', 'hatexplain', 'biasframes_offensive', 'biasframes_sex', 'biasframes_intent',
|
31 |
'agnews', 'yahootopics',
|
32 |
-
'trueteacher', 'spam', 'wellformedquery'
|
33 |
-
|
34 |
-
|
35 |
|
36 |
Note that compared to other NLI models, this model predicts two classes (`entailment` vs. `not_entailment`)
|
37 |
as opposed to three classes (entailment/neutral/contradiction)
|
@@ -41,10 +40,11 @@ as opposed to three classes (entailment/neutral/contradiction)
|
|
41 |
#### Simple zero-shot classification pipeline
|
42 |
```python
|
43 |
from transformers import pipeline
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
|
|
48 |
print(output)
|
49 |
```
|
50 |
|
@@ -60,12 +60,10 @@ Please consult the original DeBERTa paper and the papers for the different datas
|
|
60 |
The base model (DeBERTa-v3) is published under the MIT license.
|
61 |
The datasets the model was fine-tuned on are published under a diverse set of licenses.
|
62 |
The following spreadsheet provides an overview of the non-NLI datasets used for fine-tuning.
|
63 |
-
The spreadsheets contains information on licenses, the underlying papers etc.: https://
|
64 |
-
|
65 |
-
In addition, the model was also trained on the following NLI datasets: MNLI, ANLI, WANLI, LING-NLI, FEVER-NLI.
|
66 |
|
67 |
## Citation
|
68 |
-
If you use this model, please cite:
|
69 |
```
|
70 |
@article{laurer_less_2023,
|
71 |
title = {Less {Annotating}, {More} {Classifying}: {Addressing} the {Data} {Scarcity} {Issue} of {Supervised} {Machine} {Learning} with {Deep} {Transfer} {Learning} and {BERT}-{NLI}},
|
@@ -87,25 +85,25 @@ If you use this model, please cite:
|
|
87 |
If you have questions or ideas for cooperation, contact me at m{dot}laurer{at}vu{dot}nl or [LinkedIn](https://www.linkedin.com/in/moritz-laurer/)
|
88 |
|
89 |
### Debugging and issues
|
90 |
-
Note that DeBERTa-v3 was released on 06.12.21 and older versions of HF Transformers
|
91 |
|
92 |
### Hypotheses used for classification
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
I recommend formulating your hypotheses in a similar format. For example:
|
97 |
|
98 |
```python
|
99 |
from transformers import pipeline
|
100 |
text = "Angela Merkel is a politician in Germany and leader of the CDU"
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
output = classifier(text, classes_verbalised, hypothesis_template=hypothesis_template, multi_label=False)
|
106 |
print(output)
|
107 |
```
|
108 |
|
|
|
|
|
109 |
|
110 |
#### wellformedquery
|
111 |
| label | hypothesis |
|
|
|
12 |
# deberta-v3-large-zeroshot-v1.1-all-33
|
13 |
## Model description
|
14 |
The model is designed for zero-shot classification with the Hugging Face pipeline.
|
|
|
|
|
15 |
|
16 |
The model can do one universal task: determine whether a hypothesis is `true` or `not_true`
|
17 |
given a text (also called `entailment` vs. `not_entailment`).
|
|
|
19 |
The task is so universal that any classification task can be reformulated into the task.
|
20 |
|
21 |
## Training data
|
22 |
+
The model was trained on a mixture of 33 datasets and 389 classes that have been reformatted into this universal format.
|
23 |
+
1. Five NLI datasets with ~885k texts: "mnli", "anli", "fever", "wanli", "ling"
|
24 |
+
2. 28 classification tasks with ~51k texts:
|
25 |
'amazonpolarity', 'imdb', 'appreviews', 'yelpreviews', 'rottentomatoes',
|
26 |
'emotiondair', 'emocontext', 'empathetic',
|
27 |
'financialphrasebank', 'banking77', 'massive',
|
28 |
'wikitoxic_toxicaggregated', 'wikitoxic_obscene', 'wikitoxic_threat', 'wikitoxic_insult', 'wikitoxic_identityhate',
|
29 |
'hateoffensive', 'hatexplain', 'biasframes_offensive', 'biasframes_sex', 'biasframes_intent',
|
30 |
'agnews', 'yahootopics',
|
31 |
+
'trueteacher', 'spam', 'wellformedquery',
|
32 |
+
'manifesto', 'capsotu'.
|
33 |
+
See details on each dataset here: https://github.com/MoritzLaurer/zeroshot-classifier/blob/main/datasets_overview.csv
|
34 |
|
35 |
Note that compared to other NLI models, this model predicts two classes (`entailment` vs. `not_entailment`)
|
36 |
as opposed to three classes (entailment/neutral/contradiction)
|
|
|
40 |
#### Simple zero-shot classification pipeline
|
41 |
```python
|
42 |
from transformers import pipeline
|
43 |
+
text = "Angela Merkel is a politician in Germany and leader of the CDU"
|
44 |
+
hypothesis_template = "This example is about {}"
|
45 |
+
classes_verbalized = ["politics", "economy", "entertainment", "environment"]
|
46 |
+
zeroshot_classifier = pipeline("zero-shot-classification", model="MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33")
|
47 |
+
output = zeroshot_classifier(text, classes_verbalised, hypothesis_template=hypothesis_template, multi_label=False)
|
48 |
print(output)
|
49 |
```
|
50 |
|
|
|
60 |
The base model (DeBERTa-v3) is published under the MIT license.
|
61 |
The datasets the model was fine-tuned on are published under a diverse set of licenses.
|
62 |
The following spreadsheet provides an overview of the non-NLI datasets used for fine-tuning.
|
63 |
+
The spreadsheets contains information on licenses, the underlying papers etc.: https://github.com/MoritzLaurer/zeroshot-classifier/blob/main/datasets_overview.csv
|
|
|
|
|
64 |
|
65 |
## Citation
|
66 |
+
If you use this model academically, please cite:
|
67 |
```
|
68 |
@article{laurer_less_2023,
|
69 |
title = {Less {Annotating}, {More} {Classifying}: {Addressing} the {Data} {Scarcity} {Issue} of {Supervised} {Machine} {Learning} with {Deep} {Transfer} {Learning} and {BERT}-{NLI}},
|
|
|
85 |
If you have questions or ideas for cooperation, contact me at m{dot}laurer{at}vu{dot}nl or [LinkedIn](https://www.linkedin.com/in/moritz-laurer/)
|
86 |
|
87 |
### Debugging and issues
|
88 |
+
Note that DeBERTa-v3 was released on 06.12.21 and older versions of HF Transformers can have issues running the model (e.g. resulting in an issue with the tokenizer). Using Transformers>=4.13 might solve some issues.
|
89 |
|
90 |
### Hypotheses used for classification
|
91 |
+
The hypotheses in the tables below were used to fine-tune the model.
|
92 |
+
Inspecting them can help users get a feeling for which type of hypotheses and tasks the model was trained on.
|
93 |
+
You can formulate your own hypotheses by changing the `hypothesis_template` of the zeroshot pipeline. For example:
|
|
|
94 |
|
95 |
```python
|
96 |
from transformers import pipeline
|
97 |
text = "Angela Merkel is a politician in Germany and leader of the CDU"
|
98 |
+
hypothesis_template = "Merkel is the leader of the party: {}"
|
99 |
+
classes_verbalized = ["CDU", "SPD", "Greens"]
|
100 |
+
zeroshot_classifier = pipeline("zero-shot-classification", model="MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33")
|
101 |
+
output = zeroshot_classifier(text, classes_verbalised, hypothesis_template=hypothesis_template, multi_label=False)
|
|
|
102 |
print(output)
|
103 |
```
|
104 |
|
105 |
+
Note that a few rows in the `massive` and `banking77` datasets contain `nan` because some classes were so ambiguous/unclear that I excluded them from the data.
|
106 |
+
|
107 |
|
108 |
#### wellformedquery
|
109 |
| label | hypothesis |
|