mingfengxue
commited on
Commit
·
b609220
1
Parent(s):
cc72b35
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,35 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- OFA-Sys/OccuQuest
|
5 |
+
language:
|
6 |
+
- en
|
7 |
---
|
8 |
+
|
9 |
+
This is the OccuLLaMA-7B model in [OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models](https://arxiv.org/abs/2310.16517)
|
10 |
+
|
11 |
+
Abstract:
|
12 |
+
The emergence of large language models (LLMs) has revolutionized natural language processing tasks.
|
13 |
+
However, existing instruction-tuning datasets suffer from occupational bias: the majority of data relates to only a few occupations, which hampers the instruction-tuned LLMs to generate helpful responses to professional queries from practitioners in specific fields.
|
14 |
+
To mitigate this issue and promote occupation-inclusive LLMs, we create an instruction-tuning dataset named OccuQuest, which contains 110,000+ prompt-completion pairs and 30,000+ dialogues covering over 1,000 occupations in 26 occupational categories.
|
15 |
+
We systematically request ChatGPT, organizing queries hierarchically based on Occupation, Responsibility, Topic, and Question, to ensure a comprehensive coverage of occupational specialty inquiries.
|
16 |
+
By comparing with three commonly used datasets (Dolly, ShareGPT, and WizardLM), we observe that OccuQuest exhibits a more balanced distribution across occupations.
|
17 |
+
Furthermore, we assemble three test sets for comprehensive evaluation, an occu-test set covering 25 occupational categories, an estate set focusing on real estate, and an occu-quora set containing real-world questions from Quora.
|
18 |
+
We then fine-tune LLaMA on OccuQuest to obtain OccuLLaMA, which significantly outperforms state-of-the-art LLaMA variants (Vicuna, Tulu, and WizardLM) on professional questions in GPT-4 and human evaluations.
|
19 |
+
Notably, on the occu-quora set, OccuLLaMA reaches a high win rate of 86.4\% against WizardLM.
|
20 |
+
Furthermore, we demonstrate the potential of combining OccuQuest with other instruction-tuning datasets to enhance the overall performance of LLMs.
|
21 |
+
By fine-tuning LLaMA on a mixture of OccuQuest and Tulu datasets, we introduce ProLLaMA, which excels in addressing occupational questions and exhibits superior performance in comprehensive evaluations such as MMLU, GSM8K, BBH, and HumanEval.
|
22 |
+
Among the different LLaMA variants, the 7B and 13B ProLLaMA models achieve the highest performance on MMLU and GSM8K, with the 7B ProLLaMA model demonstrating an improvement of more than 4 points over the other 7B variants on GSM8K.
|
23 |
+
We open release the dataset and models.
|
24 |
+
|
25 |
+
Please cite if you use this dataset:
|
26 |
+
```
|
27 |
+
@misc{xue2023occuquest,
|
28 |
+
title={OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models},
|
29 |
+
author={Mingfeng Xue and Dayiheng Liu and Kexin Yang and Guanting Dong and Wenqiang Lei and Zheng Yuan and Chang Zhou and Jingren Zhou},
|
30 |
+
year={2023},
|
31 |
+
eprint={2310.16517},
|
32 |
+
archivePrefix={arXiv},
|
33 |
+
primaryClass={cs.CL}
|
34 |
+
}
|
35 |
+
```
|