File size: 3,350 Bytes
40f483f 81373ae |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
license: apache-2.0
datasets:
- FrankRin/Insur-QA
language:
- zh
base_model:
- Qwen/Qwen1.5-14B-Chat
pipeline_tag: text-generation
---
This repository contains the InsLLM, version of Qwen1.5-14B-Chat as the base model.
<div align="center">
<h1>InsQABench</h1>
</div>
InsQABench is the first large-scale specialized question-answering dataset and evaluation benchmark in the Chinese insurance sector, developed and open-sourced by the VLR Lab (Vision and Learning Representation Group) at Huazhong University of Science and Technology.
## Overview
InsLLM is an intelligent insurance system equipped with capabilities for insurance-related question answering, database querying, and contract parsing. Designed for diverse user groups and application scenarios, it offers the following key features:
* **Insurance Text Processing:** The system is capable of understanding and generating content related to complex professional terms and document formats specific to the insurance domain. This includes tasks like information extraction and document summarization. We have constructed fine-tuning datasets based on publicly available insurance data and real-world insurance documents.
* **Insurance Reasoning:** Leveraging the SQL-ReAct method, the system can optimize and correct SQL queries based on user inputs, efficiently handling complex query tasks within insurance databases.
* **Insurance Knowledge Compliance:** Equipped with the Insur-Know module, the system supports contract parsing and fact extraction enhanced by retrieval, ensuring accurate handling of complex issues in insurance contracts.
Additionally, our research offers the following contributions:
* **High-quality insurance question-answering training datasets and effective training paradigms**
* **A comprehensive insurance model evaluation framework and evaluation datasets**
## Insur-QA Dataset
In the basic insurance knowledge section, we translated the InsuranceQA dataset to create the InsuranceQA_zh dataset.
For the insurance contract data section, we downloaded PDF insurance policies from various insurance companies available online and parsed them using the Adobe PDF Extract API. After restructuring the paragraph text from the parsed results, we used Gemini to generate QA pairs, forming <Q, A, E> triples.
The specific composition of the datasets is as follows:
<table border="1">
<tr>
<th>Task</th>
<th>Dataset</th>
<th>Source</th>
<th>Size</th>
</tr>
<tr>
<td rowspan="2">Basic Insurance Knowledge Q&A</td>
<td>Training Set</td>
<td>BX_GPT3.5</td>
<td>10k</td>
</tr>
<tr>
<td>Test Set</td>
<td>Insurance_QA_zh</td>
<td>3k</td>
</tr>
<tr>
<td rowspan="2">Insurance Contract Q&A</td>
<td>Training Set</td>
<td>Insurance Contracts</td>
<td>40k</td>
</tr>
<tr>
<td>Test Set</td>
<td>Insurance Contracts</td>
<td>100</td>
</tr>
<tr>
<td rowspan="2">Insurance Database Q&A</td>
<td>Training Set</td>
<td>Insurance Contracts</td>
<td>44k</td>
</tr>
<tr>
<td>Test Set</td>
<td>Insurance Contracts</td>
<td>546</td>
</tr>
</table>
## Citation
If you find our work helpful in your research, please consider citing it as follows:
```
@misc{
}
```
## License
InsQABench is available under the Apache License.
|