Update README.md
Browse files
README.md
CHANGED
@@ -42,6 +42,11 @@ Standard RAG models often struggle due to:
|
|
42 |
- chunk_list: the list of retrieved chunks, e.g., ["chunk 1", "chunk 2", "chunk 3"]
|
43 |
|
44 |
```python
|
|
|
|
|
|
|
|
|
|
|
45 |
def format_prompt_template(query, chunk_list):
|
46 |
|
47 |
chunk_list = ['[Chunk ID: '+ str(idx+1) + '] ' + chunk_text for idx, chunk_text in enumerate(chunk_list)]
|
@@ -87,6 +92,9 @@ Answer: Your Answer
|
|
87 |
|
88 |
return prompt.strip()
|
89 |
|
|
|
|
|
|
|
90 |
```
|
91 |
|
92 |
|
@@ -103,6 +111,14 @@ Answer: The estimated number of deaths at Chelmno is 150-300,000, mainly Jews.
|
|
103 |
|
104 |
The number of extracted sentences vary depending on the QA.
|
105 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
### Performance Benchmark
|
108 |
Our evaluations demonstrate that Ext2Gen-8B-R2 significantly enhances robustness in RAG systems:
|
|
|
42 |
- chunk_list: the list of retrieved chunks, e.g., ["chunk 1", "chunk 2", "chunk 3"]
|
43 |
|
44 |
```python
|
45 |
+
|
46 |
+
def prepare_sample_text(prompt):
|
47 |
+
row_json = [{"role": "user", "content": prompt}]
|
48 |
+
return tokenizer.apply_chat_template(row_json, tokenize=False)
|
49 |
+
|
50 |
def format_prompt_template(query, chunk_list):
|
51 |
|
52 |
chunk_list = ['[Chunk ID: '+ str(idx+1) + '] ' + chunk_text for idx, chunk_text in enumerate(chunk_list)]
|
|
|
92 |
|
93 |
return prompt.strip()
|
94 |
|
95 |
+
|
96 |
+
prompt = format_prompt_template(query, noisy_chunks)
|
97 |
+
prompt = prepare_sample_text(prompt)
|
98 |
```
|
99 |
|
100 |
|
|
|
111 |
|
112 |
The number of extracted sentences vary depending on the QA.
|
113 |
|
114 |
+
### Generation Parameters
|
115 |
+
|
116 |
+
```python
|
117 |
+
max_new_tokens=1024, # or 2048
|
118 |
+
do_sample=True,
|
119 |
+
temperature=0.8,
|
120 |
+
top_p=0.9,
|
121 |
+
```
|
122 |
|
123 |
### Performance Benchmark
|
124 |
Our evaluations demonstrate that Ext2Gen-8B-R2 significantly enhances robustness in RAG systems:
|