Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
|
2 |
inference: false
|
3 |
datasets:
|
4 |
- bclavie/mmarco-japanese-hard-negatives
|
@@ -8,7 +8,7 @@ language:
|
|
8 |
pipeline_tag: sentence-similarity
|
9 |
tags:
|
10 |
- ColBERT
|
11 |
-
|
12 |
Under Construction, please come back in a few days!
|
13 |
工事中です。数日後にまたお越しください。
|
14 |
|
@@ -23,9 +23,9 @@ Under Construction, please come back in a few days!
|
|
23 |
(refer to the technical report for exact evaluation method + code)
|
24 |
|
25 |
| | JSQuAD | | | MIRACL | | | MrTyDi | | | Average | | |
|
26 |
-
|
|
27 |
| | R@1 | R@5 | R@10 | R@3 | R@5 | R@10 | R@3 | R@5 | R@10 | R@\{1\|3\} | R@5 | R@10 |
|
28 |
-
|
|
29 |
| JaColBERT | **0.906** | **0.968** | 0.978 | 0.464 | 0.546 | 0.645 | 0.744 | 0.781 | 0.821 | **0.705** | 0.765 | 0.813 |
|
30 |
| m-e5-large (in-domain) | | | | | | | | | | | | |
|
31 |
| m-e5-base (in-domain) | *0.838* | *0.955* | 0.973 | **0.482** | **0.553** | 0.632 | **0.777** | **0.815** | 0.857 | 0.699 | **0.775** | 0.820 |
|
@@ -72,7 +72,7 @@ ColBERT looks slightly more unfriendly than a usual `transformers` model, but a
|
|
72 |
|
73 |
In order for the late-interaction retrieval approach used by ColBERT to work, you must first build your index.
|
74 |
Think of it like using an embedding model, like e5, to embed all your documents and storing them in a vector database.
|
75 |
-
Indexing is the slowest step
|
76 |
|
77 |
```python
|
78 |
from colbert import Indexer
|
|
|
1 |
+
-
|
2 |
inference: false
|
3 |
datasets:
|
4 |
- bclavie/mmarco-japanese-hard-negatives
|
|
|
8 |
pipeline_tag: sentence-similarity
|
9 |
tags:
|
10 |
- ColBERT
|
11 |
+
-
|
12 |
Under Construction, please come back in a few days!
|
13 |
工事中です。数日後にまたお越しください。
|
14 |
|
|
|
23 |
(refer to the technical report for exact evaluation method + code)
|
24 |
|
25 |
| | JSQuAD | | | MIRACL | | | MrTyDi | | | Average | | |
|
26 |
+
| | | | | | | | | | | | | |
|
27 |
| | R@1 | R@5 | R@10 | R@3 | R@5 | R@10 | R@3 | R@5 | R@10 | R@\{1\|3\} | R@5 | R@10 |
|
28 |
+
| | | | | | | | | | | | | |
|
29 |
| JaColBERT | **0.906** | **0.968** | 0.978 | 0.464 | 0.546 | 0.645 | 0.744 | 0.781 | 0.821 | **0.705** | 0.765 | 0.813 |
|
30 |
| m-e5-large (in-domain) | | | | | | | | | | | | |
|
31 |
| m-e5-base (in-domain) | *0.838* | *0.955* | 0.973 | **0.482** | **0.553** | 0.632 | **0.777** | **0.815** | 0.857 | 0.699 | **0.775** | 0.820 |
|
|
|
72 |
|
73 |
In order for the late-interaction retrieval approach used by ColBERT to work, you must first build your index.
|
74 |
Think of it like using an embedding model, like e5, to embed all your documents and storing them in a vector database.
|
75 |
+
Indexing is the slowest step retrieval is extremely quick. There are some tricks to speed it up, but the default settings work fairly well:
|
76 |
|
77 |
```python
|
78 |
from colbert import Indexer
|