bclavie
/

JaColBERT

@@ -1,4 +1,4 @@
----
 inference: false
 datasets:
 - bclavie/mmarco-japanese-hard-negatives
@@ -8,7 +8,7 @@ language:
 pipeline_tag: sentence-similarity
 tags:
   - ColBERT
----
 Under Construction, please come back in a few days!
 工事中です。数日後にまたお越しください。
@@ -23,9 +23,9 @@ Under Construction, please come back in a few days!
 (refer to the technical report for exact evaluation method + code)
 |                                                                           | JSQuAD                  |                      |        | MIRACL                  |                      |        | MrTyDi                  |                      |        | Average                 |                      |        |
-| ------------------------------------------------------------------------ | ----------------------- | -------------------- | ------ | ----------------------- | -------------------- | ------ | ----------------------- | -------------------- | ------ | ----------------------- | -------------------- | ------ |
 |                                                                           | R@1                     | R@5                  | R@10   | R@3                     | R@5                  | R@10   | R@3                     | R@5                  | R@10   | R@\{1\|3\}              | R@5                  | R@10   |
-| ------------------------------------------------------------------------ | ----------------------- | -------------------- | ------ | ----------------------- | -------------------- | ------ | ----------------------- | -------------------- | ------ | ----------------------- | -------------------- | ------ |
 | JaColBERT                                                                | **0.906**               | **0.968**            | 0.978  | 0.464                   | 0.546                | 0.645  | 0.744                   | 0.781                | 0.821  | **0.705**               | 0.765                | 0.813  |
 | m-e5-large (in-domain)                                                   |                         |                      |        |                         |                      |        |                         |                      |        |                         |                      |        |
 | m-e5-base (in-domain)                                                    | *0.838*                 | *0.955*              | 0.973  | **0.482**               | **0.553**            | 0.632  | **0.777**               | **0.815**            | 0.857  | 0.699                   | **0.775**            | 0.820  |
@@ -72,7 +72,7 @@ ColBERT looks slightly more unfriendly than a usual `transformers` model, but a
 In order for the late-interaction retrieval approach used by ColBERT to work, you must first build your index.
 Think of it like using an embedding model, like e5, to embed all your documents and storing them in a vector database.
-Indexing is the slowest step -- retrieval is extremely quick. There are some tricks to speed it up, but the default settings work fairly well:
 ```python
 from colbert import Indexer

+-
 inference: false
 datasets:
 - bclavie/mmarco-japanese-hard-negatives
 pipeline_tag: sentence-similarity
 tags:
   - ColBERT
+-
 Under Construction, please come back in a few days!
 工事中です。数日後にまたお越しください。
 (refer to the technical report for exact evaluation method + code)
 |                                                                           | JSQuAD                  |                      |        | MIRACL                  |                      |        | MrTyDi                  |                      |        | Average                 |                      |        |
+|  |  |  |  |  |  |  |  |  |  |  |  |  |
 |                                                                           | R@1                     | R@5                  | R@10   | R@3                     | R@5                  | R@10   | R@3                     | R@5                  | R@10   | R@\{1\|3\}              | R@5                  | R@10   |
+|  |  |  |  |  |  |  |  |  |  |  |  |  |
 | JaColBERT                                                                | **0.906**               | **0.968**            | 0.978  | 0.464                   | 0.546                | 0.645  | 0.744                   | 0.781                | 0.821  | **0.705**               | 0.765                | 0.813  |
 | m-e5-large (in-domain)                                                   |                         |                      |        |                         |                      |        |                         |                      |        |                         |                      |        |
 | m-e5-base (in-domain)                                                    | *0.838*                 | *0.955*              | 0.973  | **0.482**               | **0.553**            | 0.632  | **0.777**               | **0.815**            | 0.857  | 0.699                   | **0.775**            | 0.820  |
 In order for the late-interaction retrieval approach used by ColBERT to work, you must first build your index.
 Think of it like using an embedding model, like e5, to embed all your documents and storing them in a vector database.
+Indexing is the slowest step  retrieval is extremely quick. There are some tricks to speed it up, but the default settings work fairly well:
 ```python
 from colbert import Indexer