### Putting it all together When you use the document encoder in an indexing pipeline, the rewritting document contents are indexed:
D
SPLADE
D
Indexer
IDX
```python import pyterrier as pt pt.init(version='snapshot') import pyt_splade dataset = pt.get_dataset('irds:msmarco-passage') splade = pyt_splade.SpladeFactory() indexer = pt.IterDictIndexer('./msmarco_psg', pretokenized=True) indxer_pipe = splade.indexing() >> indexer indxer_pipe.index(dataset.get_corpus_iter()) ``` Once you built an index, you can build a retrieval pipeline that first encodes the query, and then performs retrieval:
Q
SPLADE
Q
TF Retriever
IDX
R
```python splade_retr = splade.query() >> pt.BatchRetrieve('./msmarco_psg', wmodel='Tf') ``` ### References & Credits This package uses [Naver's SPLADE repository](https://github.com/naver/splade). - Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant. [SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](https://arxiv.org/abs/2107.05720). SIGIR 2021. - Craig Macdonald, Nicola Tonellotto, Sean MacAvaney, Iadh Ounis. [PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval](https://dl.acm.org/doi/abs/10.1145/3459637.3482013). CIKM 2021.