Spaces:

Prompting-MoE-MaS-SeR
/

SOTA-IR-Gradio

Sleeping

App Files Files Community

awacke1 commited on Jul 17, 2024

Commit

13a1cf1

verified ·

1 Parent(s): def9b1c

Update app.py

Browse files

Files changed (1) hide show

app.py +30 -4

app.py CHANGED Viewed

@@ -16,13 +16,13 @@ def generate_query(document):
     input_ids = llm_tokenizer.encode(prompt, return_tensors="pt")
     output = llm.generate(
         input_ids,
-        max_length=50,
         num_return_sequences=5,
-        num_beams=5,  # Use beam search
         no_repeat_ngram_size=2,
         early_stopping=True
     )
-    queries = [llm_tokenizer.decode(seq, skip_special_tokens=True) for seq in output]
     return queries
 def rerank_pairs(queries, document):
@@ -46,12 +46,38 @@ def inpars_v2(document):
     result = train_retriever([(best_query, document)])
     return f"Generated query: {best_query}\n\n{result}"
 iface = gr.Interface(
     fn=inpars_v2,
     inputs=gr.Textbox(lines=5, label="Input Document"),
     outputs=gr.Textbox(label="Result"),
     title="InPars-v2 Demo",
-    description="Generate queries and train a retriever using LLMs and rerankers."
 )
 iface.launch()

     input_ids = llm_tokenizer.encode(prompt, return_tensors="pt")
     output = llm.generate(
         input_ids,
+        max_new_tokens=30,
         num_return_sequences=5,
+        num_beams=5,
         no_repeat_ngram_size=2,
         early_stopping=True
     )
+    queries = [llm_tokenizer.decode(seq[input_ids.shape[1]:], skip_special_tokens=True) for seq in output]
     return queries
 def rerank_pairs(queries, document):
     result = train_retriever([(best_query, document)])
     return f"Generated query: {best_query}\n\n{result}"
+# Markdown description of the InPars-v2 paper
+paper_description = """
+# InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval
+**Abstract Link:** [https://arxiv.org/abs/2301.01820](https://arxiv.org/abs/2301.01820)
+**PDF Link:** [https://arxiv.org/pdf/2301.01820](https://arxiv.org/pdf/2301.01820)
+**Authors:** Vitor Jeronymo, Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira
+**Publication Date:** 26 May 2023
+## Abstract
+Recently, InPars introduced a method to efficiently use large language models (LLMs) in information retrieval tasks: via few-shot examples, an LLM is induced to generate relevant queries for documents. These synthetic query-document pairs can then be used to train a retriever. However, InPars and, more recently, Promptagator, rely on proprietary LLMs such as GPT-3 and FLAN to generate such datasets. In this work we introduce InPars-v2, a dataset generator that uses open-source LLMs and existing powerful rerankers to select synthetic query-document pairs for training. A simple BM25 retrieval pipeline followed by a monoT5 reranker finetuned on InPars-v2 data achieves new state-of-the-art results on the BEIR benchmark. To allow researchers to further improve our method, we open source the code, synthetic data, and finetuned models: [https://github.com/zetaalphavector/inPars/tree/master/tpu](https://github.com/zetaalphavector/inPars/tree/master/tpu)
+## Key Features of InPars-v2
+1. Uses open-source LLMs for query generation
+2. Employs powerful rerankers to select high-quality synthetic query-document pairs
+3. Achieves state-of-the-art results on the BEIR benchmark
+4. Provides open-source code, synthetic data, and finetuned models
+This demo provides a simplified implementation of the InPars-v2 concept, showcasing query generation, reranking, and retriever training.
+"""
 iface = gr.Interface(
     fn=inpars_v2,
     inputs=gr.Textbox(lines=5, label="Input Document"),
     outputs=gr.Textbox(label="Result"),
     title="InPars-v2 Demo",
+    description=paper_description,
+    article="This is a minimal implementation of the InPars-v2 concept. For the full implementation and more details, please refer to the original paper and GitHub repository."
 )
 iface.launch()