prithivida commited on
Commit
f9573c0
·
verified ·
1 Parent(s): b2893b9

Added context for lang tokens

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -81,9 +81,9 @@ SPLADE BOW rep:
81
  ## How does it translate into Empirical metrics?
82
 
83
  Our models are token sparse and yet effective. It translates to faster retrieval (User experience) and smaller index size ($). Mean retrieval time on the standard MS-MARCO small dev set and Scaled total FLOPS loss are the respective metrics are below.
84
- This is why Google's SparseEmbed is interesting as they also achieve SPLADE quality retrieval effectiveness with much lower FLOPs. Compared ColBERT, SPLADE and SparseEmbed match query and
85
  document terms with a linear complexity as ColBERT’s late interaction i.e. all query-document term pairs takes a quadratic complexity. The Challenge with SparseEmbed is it uses a hyperparameter called **Top-k to restrict number of tokens used to learn contextual dense representations.** Say 64 and 256 tokens for query and passage encoding.
86
- But it is unclear how well these hyperparameters are transferable to other domains or languages.
87
 
88
  <img src="./Metrics.png" width=800/>
89
 
 
81
  ## How does it translate into Empirical metrics?
82
 
83
  Our models are token sparse and yet effective. It translates to faster retrieval (User experience) and smaller index size ($). Mean retrieval time on the standard MS-MARCO small dev set and Scaled total FLOPS loss are the respective metrics are below.
84
+ This is why Google's SparseEmbed is interesting as they also achieve SPLADE quality retrieval effectiveness with much lower FLOPs. Compared to ColBERT, SPLADE and SparseEmbed match query and
85
  document terms with a linear complexity as ColBERT’s late interaction i.e. all query-document term pairs takes a quadratic complexity. The Challenge with SparseEmbed is it uses a hyperparameter called **Top-k to restrict number of tokens used to learn contextual dense representations.** Say 64 and 256 tokens for query and passage encoding.
86
+ But it is unclear how well these hyperparameters are transferable to other domains or languages (where the notion of tokens changes a lot like our mother tongue Tamil which is Agglutinative in nature).
87
 
88
  <img src="./Metrics.png" width=800/>
89