File size: 4,665 Bytes
c7b1e81 446da03 2fb80f3 c7b1e81 446da03 c7b1e81 1c09663 c7b1e81 4581cf1 c7b1e81 b555bb8 c7b1e81 b555bb8 397e123 c7b1e81 b555bb8 c7b1e81 397e123 c7b1e81 397e123 d1cae24 df47bd5 b555bb8 22cb3d8 ee34ee3 22cb3d8 446da03 22cb3d8 ee34ee3 22cb3d8 b555bb8 1e3c04d df47bd5 ee34ee3 b555bb8 c7b1e81 b555bb8 2fb80f3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
base_model:
- meta-llama/Llama-3.1-8B-Instruct
language:
- en
license: apache-2.0
pipeline_tag: question-answering
tags:
- rag
library_name: transformers
---
<div align="center">
<b style="font-size: 40px;">Ext2Gen-8B-R2</b>
</div>
Note: We are still working on this.
Are you looking for a more robust and reliable generation model for RAG system?
Here is a Ext2Gen-8B-R2 model that effectively mitigates hallucinations caused by retrieval noise and information overload.
See the details in our paper [Link](https://arxiv.org/pdf/2503.04789)
### What is Ext2Gen-8B-R2?
Ext2Gen-8B-R2 is built upon Llama3.2-8B-Instruct, incorporating preference-aligned fine-tuning through pairwise feedback learning.
This training strategy enables the model to:
- Extract highly relevant sentences from retrieved chunks before generating an answer.
- Filter out irrelevant or misleading information, reducing hallucinations.
- Align generation with human preferences by optimizing for faithfulness, completeness, and conciseness.
### Why does Ext2Gen-8B-R2 outperform standard RAG models?
Standard RAG models often struggle due to:
- Uncertain Placement – Relevant information may appear in unpredictable locations within retrieved chunks, making it difficult for LLMs to utilize it effectively.
- Information Overload – The presence of irrelevant chunks can distract the model, leading to errors or hallucinations.
- Lack of Alignment – Most generation models are not explicitly trained to prioritize relevant content over noise.
### Need a Faster Inference?
Our Ext2Gen model writes the sentences related to the query first before generating the answer. So, it needs more latency before getting the answer.
If you don't want to see the extracted sentences but want to directly see the answer with low latency, use its variant we call Gen-8B-R2.
Link: https://huggingface.co/DISLab/Gen-8B-R2
This model skips the sentence extraction phase but remains its high robustness comparable to Ext2Gen-8B-R2.
### Recommended Prompt
- query: the query to answer
- chunk_list: the list of retrieved chunks, e.g., ["chunk 1", "chunk 2", "chunk 3"]
```python
def prepare_sample_text(prompt):
row_json = [{"role": "user", "content": prompt}]
return tokenizer.apply_chat_template(row_json, tokenize=False)
def format_prompt_template(query, chunk_list):
chunk_list = ['[Chunk ID: '+ str(idx+1) + '] ' + chunk_text for idx, chunk_text in enumerate(chunk_list)]
chunk_list = '
'.join(chunk_list)
prompt = '''
You are an expert assistant trained to extract essential sentences from document chunks and generate answers based on the extracted sentences.
Your task is twofold:
- Extraction: Identify sentences that contribute to constructing a precise and accurate response to the given query.
- Generation: Formulate a concise and coherent answer based on the extracted sentences.
### Extraction Instruction:
- A query will be provided for you to answer.
- Extract only the sentences that contribute to forming an answer to the query.
- Ensure that the extracted sentences are sufficient to derive a correct and complete answer.
- If no relevant sentences are found in the provided chunks, return an empty list.
### Generation Instruction:
- Use the extracted sentences to generate a well-formed answer to the query.
- If no sentences are extracted, return "No Answer".
### Output Example:
Extracted Sentences:
- Sentence 1
- Sentence 2
Answer: Your Answer
### Query:
%s
### Chunk List:
%s
### Output:
''' % (query, chunk_list)
return prompt.strip()
prompt = format_prompt_template(query, noisy_chunks)
prompt = prepare_sample_text(prompt)
```
Note that this prompt outputs both extracted relevant sentences and the answer to the query.
The output follows a consistent format as seen in an example below.
```
Extracted Sentences:
- The estimated number of deaths is 150-300,000, mainly Jews.
Answer: The estimated number of deaths at Chelmno is 150-300,000, mainly Jews.
```
The number of extracted sentences vary depending on the QA.
### Recommended Generation Parameters
```python
max_new_tokens=1024, # or 2048
do_sample=True,
temperature=0.8,
top_p=0.9,
```
### Performance Benchmark
Our evaluations demonstrate that Ext2Gen-8B-R2 significantly enhances robustness in RAG systems:
* We conduct a QA task using RAG Systems on NQ, MS-MARCO, HotpotQA datasets.
* The difference is the generation backbone: Llama3.1-8B-Instruct vs. Ext2Gen-8B-R2
See the results in the Figure below:
 |