--- base_model: - meta-llama/Llama-3.2-3B-Instruct language: - en license: apache-2.0 pipeline_tag: text-generation library_name: transformers ---
Gen-8B-R2
Note: We are still working on this. Are you looking for a more robust and reliable generation model for RAG system? Here is a Gen-8B-R2 model that effectively mitigates hallucinations caused by retrieval noise and information overload. See the details in our paper [Link](https://arxiv.org/pdf/2503.04789) ### What is Gen-8B-R2? This model is one of the variant of Ext2Gen-8B-R2, which disables the process of extracting sentences from the chunk list. See the details of Ext2Gen-8B-R2 in https://huggingface.co/DISLab/Ext2Gen-8B-R2 ### Recommended Prompt - query: the query to answer - chunk_list: the list of retrieved chunks, e.g., ["chunk 1", "chunk 2", "chunk 3"] ```python def prepare_sample_text(prompt): row_json = [{"role": "user", "content": prompt}] return tokenizer.apply_chat_template(row_json, tokenize=False) def format_prompt_template(query, chunk_list): chunk_list = ['[Chunk ID: '+ str(idx+1) + '] ' + chunk_text for idx, chunk_text in enumerate(chunk_list)] chunk_list = ' '.join(chunk_list) prompt = ''' You are an expert assistant trained to generate answers based on document chunks. ### Generation Instruction: - Answer to the Query based on the given Chunk List. ### Query: %s ### Chunk List: %s ### Output: ''' % (query, chunk_list) return prompt.strip() prompt = format_prompt_template(query, noisy_chunks) prompt = prepare_sample_text(prompt) ``` Note that this prompt outputs both extracted relevant sentences and the answer to the query. The output follows a consistent format as seen in an example below. ``` The estimated number of deaths at Chelmno is 150-300,000, mainly Jews. ``` ### Recommended Generation Parameters ```python max_new_tokens=1024, # or 2048 do_sample=True, temperature=0.8, top_p=0.9, ```