license: apache-2.0
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
pipeline_tag: text-generation
tags:
- rag
Note: We are still working on this.
Are you looking for a more robust and reliable generation model for RAG system?
Here is a Ext2Gen-8B-R2 model that effectively mitigates hallucinations caused by retrieval noise and information overload.
See the details in our paper Link
What is Ext2Gen-8B-R2?
Ext2Gen-8B-R2 is built upon Llama3.2-8B-Instruct, incorporating preference-aligned fine-tuning through pairwise feedback learning.
This training strategy enables the model to:
- Extract highly relevant sentences from retrieved chunks before generating an answer.
- Filter out irrelevant or misleading information, reducing hallucinations.
- Align generation with human preferences by optimizing for faithfulness, completeness, and conciseness.
Why does Ext2Gen-8B-R2 outperform standard RAG models?
Standard RAG models often struggle due to:
- Uncertain Placement – Relevant information may appear in unpredictable locations within retrieved chunks, making it difficult for LLMs to utilize it effectively.
- Information Overload – The presence of irrelevant chunks can distract the model, leading to errors or hallucinations.
- Lack of Alignment – Most generation models are not explicitly trained to prioritize relevant content over noise.
Prompt
- query: the query to answer
- chunk_list: the list of retrieved chunks, e.g., ["chunk 1", "chunk 2", "chunk 3"]
def prepare_sample_text(prompt):
row_json = [{"role": "user", "content": prompt}]
return tokenizer.apply_chat_template(row_json, tokenize=False)
def format_prompt_template(query, chunk_list):
chunk_list = ['[Chunk ID: '+ str(idx+1) + '] ' + chunk_text for idx, chunk_text in enumerate(chunk_list)]
chunk_list = '\n\n'.join(chunk_list)
prompt = '''
You are an expert assistant trained to extract essential sentences from document chunks and generate answers based on the extracted sentences.
Your task is twofold:
- Extraction: Identify sentences that contribute to constructing a precise and accurate response to the given query.
- Generation: Formulate a concise and coherent answer based on the extracted sentences.
### Extraction Instruction:
- A query will be provided for you to answer.
- Extract only the sentences that contribute to forming an answer to the query.
- Ensure that the extracted sentences are sufficient to derive a correct and complete answer.
- If no relevant sentences are found in the provided chunks, return an empty list.
### Generation Instruction:
- Use the extracted sentences to generate a well-formed answer to the query.
- If no sentences are extracted, return "No Answer".
### Output Example:
Extracted Sentences:
- Sentence 1
- Sentence 2
Answer: Your Answer
### Query:
%s
### Chunk List:
%s
### Output:
''' % (query, chunk_list)
return prompt.strip()
prompt = format_prompt_template(query, noisy_chunks)
prompt = prepare_sample_text(prompt)
Note that this prompt outputs both extracted relevant sentences and the answer to the query.
The output follows a consistent format as seen in an example below.
Extracted Sentences:
- The estimated number of deaths is 150-300,000, mainly Jews.
Answer: The estimated number of deaths at Chelmno is 150-300,000, mainly Jews.
The number of extracted sentences vary depending on the QA.
Generation Parameters
max_new_tokens=1024, # or 2048
do_sample=True,
temperature=0.8,
top_p=0.9,
Performance Benchmark
Our evaluations demonstrate that Ext2Gen-8B-R2 significantly enhances robustness in RAG systems:
- We conduct a QA task using RAG Systems on NQ, MS-MARCO, HotpotQA datasets.
- The difference is the generation backbone: Llama3.1-8B-Instruct vs. Ext2Gen-8B-R2
See the results in the Figure below:
