---
base_model:
- meta-llama/Llama-3.2-3B-Instruct
language:
- en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
---


<div align="center">
  <b style="font-size: 40px;">Gen-8B-R2</b>
</div>

Note: We are still working on this.

Are you looking for a more robust and reliable generation model for RAG system?

Here is a Gen-8B-R2 model that effectively mitigates hallucinations caused by retrieval noise and information overload.

See the details in our paper [Link](https://arxiv.org/pdf/2503.04789)


### What is Gen-8B-R2?

This model is one of the variant of Ext2Gen-8B-R2, which disables the process of extracting sentences from the chunk list.

See the details of Ext2Gen-8B-R2 in https://huggingface.co/DISLab/Ext2Gen-8B-R2

### Recommended Prompt

- query: the query to answer
- chunk_list: the list of retrieved chunks, e.g., ["chunk 1", "chunk 2", "chunk 3"]

```python

def prepare_sample_text(prompt):
    row_json = [{"role": "user", "content": prompt}]
    return tokenizer.apply_chat_template(row_json, tokenize=False)

def format_prompt_template(query, chunk_list):

    chunk_list = ['[Chunk ID: '+ str(idx+1) + '] ' + chunk_text for idx, chunk_text in enumerate(chunk_list)]
    chunk_list = '

'.join(chunk_list)

    prompt = '''
You are an expert assistant trained to generate answers based on document chunks.


### Generation Instruction:
- Answer to the Query based on the given Chunk List.


### Query: 
%s


### Chunk List:
%s


### Output:
''' % (query, chunk_list)
    
    return prompt.strip()


prompt = format_prompt_template(query, noisy_chunks)
prompt =  prepare_sample_text(prompt)
```


Note that this prompt outputs both extracted relevant sentences and the answer to the query. 

The output follows a consistent format as seen in an example below.

```
The estimated number of deaths at Chelmno is 150-300,000, mainly Jews.
```

### Recommended Generation Parameters

```python
max_new_tokens=1024, # or 2048 
do_sample=True,
temperature=0.8,
top_p=0.9,
```