File size: 5,025 Bytes
96c1cc0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
license: apache-2.0
datasets:
- raidium/ECNQA_generated_questions
library_name: transformers
tags:
- medical
base_model: stanford-crfm/BioMedLM
---
# Model Card for Raidium MQG model
The model is introduced in the paper "Efficient Medical Question Answering with Knowledge-Augmented Question Generation".
Paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)
MQG is is a transformer language model pre-trained on a series of medical textbooks, and medical questions generated by GPT-4. The weights are initialized with
[BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM), then further pre-trained on those datasets.
The questions have been generated from prompt containing medical data from the textbooks.
They are available here: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions).
MQG is designed to be fine-tuned for Medical Question Answering tasks.
## Model Details
### Model Description
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cdea59a9be5c195561c2b8/tMb8cNuV6ZYnjrnUC1Tg2.png)
In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain.
Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind.
In this work, we introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach.
We first fine-tune the model on a corpus of medical textbooks. Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model.
We show the benefits of our training strategy on a medical answering question dataset.
The study's findings highlight the potential of small language models in the medical domain when appropriately fine-tuned.
- **Developed by:** Raidium
- **Model type:** Transformer
- **License:** Aopache 2.0
- **Finetuned from model:** [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM)
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [https://github.com/raidium-med/MQG]
- **Paper:** [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)
## Uses
### Direct Use
MQG is trained using next-token-prediction on generated questions.
Therefore, it can be used out-of-the-box to generate potential answers for medical question answering tasks.
However, the generated questions might contain some errors, so it is advised to fine-tune the model on your dataset, and use the models to rank the potential answers.
### Downstream Use
MQG can be fine-tuned for Medical Question Answering tasks.
For multiple choice questions, a classification head should be appended at the end of the model, to rank different proposed answers.
### Out-of-Scope Use
This model should not be used for datasets outside medical tasks.
## Bias, Risks, and Limitations
There is no guarantee that the model answers medical questions correctly. It should only be used for academic purposes, and not in clinical care.
## Training Details
### Training Data
The model is trained on a corpus of medical textbooks, and further pre-trained on generated questions: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions).
### Training Procedure
MGQ is trained using next-token-prediction on both datasets.
#### Training Hyperparameters
- **Training regime:** fp16 mixed-precision training. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
We tested the model on a medical question answering dataset, ECN-QA, based on the french medical residency examination.
It is composed of "single" and "progressive" questions (i.e a serie of multiple related questions).
It is a multiple-choice question dataset, containing 5 propositions for each question.
#### Metrics
We use the accuracy to evaluate the model on Medical Question Answering.
### Results
See paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)
### Model Architecture and Objective
The model is based on BioMedLM's architecture, which is modified from GPT-2 architecture.
### Compute Infrastructure
#### Hardware
The model was trained on the Jean-Zay supercomputer, on multiple nodes with 4 A100 gpus.
#### Software
Pytorch, DeepSpeed
## Citation
**BibTeX:**
```
@article{khlaut2024efficient,
title={Efficient Medical Question Answering with Knowledge-Augmented Question Generation},
author={Khlaut, Julien and Dancette, Corentin and Ferreres, Elodie and Bennani, Alaedine and H{\'e}rent, Paul and Manceron, Pierre},
journal={Clinical NLP Workshop, NAACL 2024},
year={2024}
}
```
## Model Card Contact
julien.khlaut at raidium.fr
|