loic-dagnas-sinequa
commited on
Commit
•
8dc4e23
1
Parent(s):
a3db8bd
Update README.md
Browse files
@basilevc
@skirres
I have specified that by Chinese we meant simplified chinese as requested by
@ArianeCavet
here.
I have also reorder the language by the alphabetical order of the language codes,
@ArianeCavet
ok for you?
Just note that zs is not recognized by huggingface language tags.
README.md
CHANGED
@@ -1,18 +1,18 @@
|
|
1 |
---
|
2 |
pipeline_tag: sentence-similarity
|
3 |
tags:
|
4 |
-
|
5 |
-
|
6 |
language:
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
---
|
17 |
|
18 |
# Model Card for `vectorizer.raspberry`
|
@@ -27,15 +27,15 @@ Model name: `vectorizer.raspberry`
|
|
27 |
|
28 |
The model was trained and tested in the following languages:
|
29 |
|
30 |
-
- English
|
31 |
-
- French
|
32 |
- German
|
|
|
33 |
- Spanish
|
|
|
34 |
- Italian
|
35 |
- Dutch
|
36 |
- Japanese
|
37 |
- Portuguese
|
38 |
-
- Chinese
|
39 |
|
40 |
Besides these languages, basic support can be expected for additional 91 languages that were used during the pretraining
|
41 |
of the base model (see Appendix A of XLM-R paper).
|
@@ -115,10 +115,10 @@ We evaluated the model on the datasets of the [MIRACL benchmark](https://github.
|
|
115 |
multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
|
116 |
for the existing languages.
|
117 |
|
118 |
-
| Language
|
119 |
-
|
120 |
-
|
|
121 |
-
|
|
122 |
-
|
|
123 |
-
| Japanese
|
124 |
-
| Chinese | 0.680 |
|
|
|
1 |
---
|
2 |
pipeline_tag: sentence-similarity
|
3 |
tags:
|
4 |
+
- feature-extraction
|
5 |
+
- sentence-similarity
|
6 |
language:
|
7 |
+
- de
|
8 |
+
- en
|
9 |
+
- es
|
10 |
+
- fr
|
11 |
+
- it
|
12 |
+
- nl
|
13 |
+
- ja
|
14 |
+
- pt
|
15 |
+
- zs
|
16 |
---
|
17 |
|
18 |
# Model Card for `vectorizer.raspberry`
|
|
|
27 |
|
28 |
The model was trained and tested in the following languages:
|
29 |
|
|
|
|
|
30 |
- German
|
31 |
+
- English
|
32 |
- Spanish
|
33 |
+
- French
|
34 |
- Italian
|
35 |
- Dutch
|
36 |
- Japanese
|
37 |
- Portuguese
|
38 |
+
- Simplified Chinese
|
39 |
|
40 |
Besides these languages, basic support can be expected for additional 91 languages that were used during the pretraining
|
41 |
of the base model (see Appendix A of XLM-R paper).
|
|
|
115 |
multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
|
116 |
for the existing languages.
|
117 |
|
118 |
+
| Language | Recall@100 |
|
119 |
+
|:--------------------|-----------:|
|
120 |
+
| German | 0.528 |
|
121 |
+
| Spanish | 0.602 |
|
122 |
+
| French | 0.650 |
|
123 |
+
| Japanese | 0.614 |
|
124 |
+
| Simplified Chinese | 0.680 |
|