Zero-Shot Classification
Transformers
PyTorch
Safetensors
bert
text-classification
Inference Endpoints
saattrupdan commited on
Commit
f4e7bde
1 Parent(s): 77f3fcd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -2
README.md CHANGED
@@ -67,9 +67,28 @@ You can use this model in your scripts as follows:
67
 
68
  ## Performance
69
 
70
- As Danish is, as far as we are aware, the only Scandinavian language with a gold standard NLI dataset, namely the [DanFEVER dataset](https://aclanthology.org/2021.nodalida-main.pdf#page=439), we report evaluation scores on the test split of that dataset.
71
 
72
- We report Matthew's Correlation Coefficient (MCC), macro-average F1-score as well as accuracy.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
  | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
75
  | :-------- | :------------ | :--------- | :----------- | :----------- |
@@ -81,6 +100,39 @@ We report Matthew's Correlation Coefficient (MCC), macro-average F1-score as wel
81
  | [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | 47.28% | 48.88% | 73.46% | **22M** |
82
 
83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  ## Training procedure
85
 
86
  It has been fine-tuned on a dataset composed of [DanFEVER](https://aclanthology.org/2021.nodalida-main.pdf#page=439) as well as machine translated versions of [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) and [CommitmentBank](https://doi.org/10.18148/sub/2019.v23i2.601) into all three languages, and machine translated versions of [FEVER](https://aclanthology.org/N18-1074/) and [Adversarial NLI](https://aclanthology.org/2020.acl-main.441/) into Swedish.
 
67
 
68
  ## Performance
69
 
70
+ We evaluate the models in Danish, Swedish and Norwegian Bokmål separately.
71
 
72
+ In all cases, we report Matthew's Correlation Coefficient (MCC), macro-average F1-score as well as accuracy.
73
+
74
+
75
+ ### Scandinavian Evaluation
76
+
77
+ The Scandinavian scores are the average of the Danish, Swedish and Norwegian scores, which can be found in the sections below.
78
+
79
+ | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
80
+ | :-------- | :------------ | :--------- | :----------- | :----------- |
81
+ | [`alexandrainst/scandi-nli-large`](https://huggingface.co/alexandrainst/scandi-nli-large) | asd | asd | asd | 354M |
82
+ | [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | asd | asd | asd | 279M |
83
+ | `alexandrainst/scandi-nli-base` (this) | asd | asd | asd | 178M |
84
+ | [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 63.94% | 70.41% | 77.23% | 279M |
85
+ | [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
86
+ | [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | asd | asd | asd | **22M** |
87
+
88
+
89
+ ### Danish Evaluation
90
+
91
+ We use a test split of the [DanFEVER dataset](https://aclanthology.org/2021.nodalida-main.pdf#page=439) to evaluate the Danish performance of the models.
92
 
93
  | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
94
  | :-------- | :------------ | :--------- | :----------- | :----------- |
 
100
  | [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | 47.28% | 48.88% | 73.46% | **22M** |
101
 
102
 
103
+ ### Swedish Evaluation
104
+
105
+ We use the test split of the machine translated version of the [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) dataset to evaluate the Swedish performance of the models.
106
+
107
+ We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Swedish.
108
+
109
+ | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
110
+ | :-------- | :------------ | :--------- | :----------- | :----------- |
111
+ | [`alexandrainst/scandi-nli-large`](https://huggingface.co/alexandrainst/scandi-nli-large) | asd | asd | asd | 354M |
112
+ | `alexandrainst/scandi-nli-base` (this) | asd | asd | asd | 178M |
113
+ | [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 73.84% | 82.46% | 82.58% | 279M |
114
+ | [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 73.32% | 82.15% | 82.08% | 279M |
115
+ | [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
116
+ | [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | asd | asd | asd | **22M** |
117
+
118
+
119
+ ### Norwegian Evaluation
120
+
121
+ We use the test split of the machine translated version of the [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) dataset to evaluate the Norwegian performance of the models.
122
+
123
+ We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Norwegian.
124
+
125
+ | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
126
+ | :-------- | :------------ | :--------- | :----------- | :----------- |
127
+ | [`alexandrainst/scandi-nli-large`](https://huggingface.co/alexandrainst/scandi-nli-large) | asd | asd | asd | 354M |
128
+ | [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 65.33% | 76.73% | 76.65% | 279M |
129
+ | `alexandrainst/scandi-nli-base` (this) | asd | asd | asd | 178M |
130
+ | [`NbAiLab/nb-bert-base-mnli`](https://huggingface.co/NbAiLab/nb-bert-base-mnli) | asd | asd | asd | 178M |
131
+ | [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 65.18% | 76.76% | 76.77% | 279M |
132
+ | [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
133
+ | [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | asd | asd | asd | **22M** |
134
+
135
+
136
  ## Training procedure
137
 
138
  It has been fine-tuned on a dataset composed of [DanFEVER](https://aclanthology.org/2021.nodalida-main.pdf#page=439) as well as machine translated versions of [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) and [CommitmentBank](https://doi.org/10.18148/sub/2019.v23i2.601) into all three languages, and machine translated versions of [FEVER](https://aclanthology.org/N18-1074/) and [Adversarial NLI](https://aclanthology.org/2020.acl-main.441/) into Swedish.