---
language: 
  - en
  
inference: false
license: apache-2.0
---
# YuyuanQA-3.5B model (Medical)，one model of [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM).
**YuyuanQA-3.5B** is fine-tuned with 10000 medical QA pairs based on **Yuyuan-3.5B** model.

**Question answering（QA）** is an important subject related to natural language processing and information retrieval. There are many application scenarios in the actual industry. **Traditional methods are often complex**, and their core algorithms involve **machine learning**, **deep learning** and **knowledge graph** related knowledge.

We hope to explore a **simpler** and more **effective** way to use the powerful memory and understanding ability of the large model to directly realize question and answer. Yuyuanqa-3.5b model is an attempt and **performs well under subjective test**.At the same time, we also tested 100 QA pairs with ***blue***:

| gram | 1-gram | 2-gram | 3-gram | 4-gram |
| ----------- | ----------- |------|------|------|
| **blue_score**   | 0.357727 | 0.2713 | 0.22304 | 0.19099 |

## Usage

### load model
```python 
from transformers import GPT2Tokenizer,GPT2LMHeadModel

hf_model_path = 'model_path'

tokenizer = GPT2Tokenizer.from_pretrained(hf_model_path)
model = GPT2LMHeadModel.from_pretrained(hf_model_path)
```
### generation
```python
fquestion = "What should gout patients pay attention to in diet?"
inputs = tokenizer(f'Question:{question} answer:',return_tensors='pt')

generation_output = model.generate(**inputs,
                                return_dict_in_generate=True,
                                output_scores=True,
                                max_length=150,
                                # max_new_tokens=80,
                                do_sample=True,
                                top_p = 0.6,
                                eos_token_id=50256,
                                pad_token_id=0,
                                num_return_sequences = 5)

for idx,sentence in enumerate(generation_output.sequences):
    print('next sentence %d:\n'%idx,
          tokenizer.decode(sentence).split('<|endoftext|>')[0])
    print('*'*40)

```
## example

![avatar](https://huggingface.co/IDEA-CCNL/YuyuanQA-3.5B/resolve/main/QA_DEMO.png)

## Citation
If you find the resource is useful, please cite the following website in your paper.
```
@misc{Fengshenbang-LM,
  title={Fengshenbang-LM},
  author={IDEA-CCNL},
  year={2022},
  howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}
```