Hwanjun commited on
Commit
ee34ee3
verified
1 Parent(s): 1e3c04d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -42,6 +42,11 @@ Standard RAG models often struggle due to:
42
  - chunk_list: the list of retrieved chunks, e.g., ["chunk 1", "chunk 2", "chunk 3"]
43
 
44
  ```python
 
 
 
 
 
45
  def format_prompt_template(query, chunk_list):
46
 
47
  chunk_list = ['[Chunk ID: '+ str(idx+1) + '] ' + chunk_text for idx, chunk_text in enumerate(chunk_list)]
@@ -87,6 +92,9 @@ Answer: Your Answer
87
 
88
  return prompt.strip()
89
 
 
 
 
90
  ```
91
 
92
 
@@ -103,6 +111,14 @@ Answer: The estimated number of deaths at Chelmno is 150-300,000, mainly Jews.
103
 
104
  The number of extracted sentences vary depending on the QA.
105
 
 
 
 
 
 
 
 
 
106
 
107
  ### Performance Benchmark
108
  Our evaluations demonstrate that Ext2Gen-8B-R2 significantly enhances robustness in RAG systems:
 
42
  - chunk_list: the list of retrieved chunks, e.g., ["chunk 1", "chunk 2", "chunk 3"]
43
 
44
  ```python
45
+
46
+ def prepare_sample_text(prompt):
47
+ row_json = [{"role": "user", "content": prompt}]
48
+ return tokenizer.apply_chat_template(row_json, tokenize=False)
49
+
50
  def format_prompt_template(query, chunk_list):
51
 
52
  chunk_list = ['[Chunk ID: '+ str(idx+1) + '] ' + chunk_text for idx, chunk_text in enumerate(chunk_list)]
 
92
 
93
  return prompt.strip()
94
 
95
+
96
+ prompt = format_prompt_template(query, noisy_chunks)
97
+ prompt = prepare_sample_text(prompt)
98
  ```
99
 
100
 
 
111
 
112
  The number of extracted sentences vary depending on the QA.
113
 
114
+ ### Generation Parameters
115
+
116
+ ```python
117
+ max_new_tokens=1024, # or 2048
118
+ do_sample=True,
119
+ temperature=0.8,
120
+ top_p=0.9,
121
+ ```
122
 
123
  ### Performance Benchmark
124
  Our evaluations demonstrate that Ext2Gen-8B-R2 significantly enhances robustness in RAG systems: