Spaces:
Running
Running
File size: 1,853 Bytes
924cd68 fef8ff6 924cd68 68730a3 924cd68 68730a3 924cd68 6c2eac0 924cd68 d6938b5 924cd68 fef8ff6 924cd68 d6938b5 924cd68 68730a3 924cd68 68730a3 924cd68 d6938b5 924cd68 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
### Putting it all together
When you use the document encoder in an indexing pipeline, the rewritten document contents are indexed:
<div class="pipeline">
<div class="df" title="Document Frame">D</div>
<div class="transformer attn" title="SPLADE Indexing Transformer">SPLADE</div>
<div class="df" title="Document Frame">D</div>
<div class="transformer" title="Indexer">Indexer</div>
<div class="artefact" title="SPLADE Index">IDX</div>
</div>
```python
import pyterrier as pt
import pyt_splade
dataset = pt.get_dataset('irds:msmarco-passage')
splade = pyt_splade.Splade()
indexer = pt.IterDictIndexer('./msmarco_psg', pretokenised=True)
indxer_pipe = splade.doc_encoder() >> indexer
indxer_pipe.index(dataset.get_corpus_iter())
```
Once you built an index, you can build a retrieval pipeline that first encodes the query,
and then performs retrieval:
<div class="pipeline">
<div class="df" title="Query Frame">Q</div>
<div class="transformer attn" title="SPLADE Query Transformer">SPLADE</div>
<div class="df" title="Query Frame">Q</div>
<div class="transformer" title="Term Frequency Transformer">TF Retriever <div class="artefact" title="SPLADE Index">IDX</div></div>
<div class="df" title="Result Frame">R</div>
</div>
```python
splade_retr = splade.query_encoder() >> pt.terrier.Retriever('./msmarco_psg', wmodel='Tf')
```
### References & Credits
This package uses [Naver's SPLADE repository](https://github.com/naver/splade).
- Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant. [SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](https://arxiv.org/abs/2107.05720). SIGIR 2021.
- Craig Macdonald, Nicola Tonellotto, Sean MacAvaney, Iadh Ounis. [PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval](https://dl.acm.org/doi/abs/10.1145/3459637.3482013). CIKM 2021.
|