|
--- |
|
title: README |
|
emoji: π’ |
|
colorFrom: indigo |
|
colorTo: blue |
|
sdk: static |
|
pinned: true |
|
license: cc-by-nc-sa-4.0 |
|
--- |
|
|
|
# BiMediX: Bilingual Medical Mixture of Experts LLM |
|
|
|
Welcome to the official HuggingFace repository for BiMediX, the bilingual medical Large Language Model (LLM) designed for English and Arabic interactions. BiMediX facilitates a broad range of **medical interactions**, including multi-turn chats, multiple-choice Q&A, and open-ended question answering. |
|
|
|
## Key Features |
|
|
|
- **Bilingual Support**: Seamless interaction in both English and Arabic for a wide range of medical interactions. |
|
- **BiMed1.3M Dataset**: Unique dataset with 1.3 million bilingual medical interactions across English and Arabic, including 250k synthesized multi-turn doctor-patient chats for instruction tuning. |
|
- **High-Quality Translation** : Utilizes a semi-automated English-to-Arabic translation pipeline with human refinement to ensure accuracy and quality in translations. |
|
- **Evaluation Benchmark for Arabic Medical LLMs**: Comprehensive benchmark for evaluating Arabic medical language models, setting a new standard in the field. |
|
- **State-of-the-Art Performance**: Outperforms existing models in medical benchmarks, while 8-times faster than comparable existing models. |
|
|
|
For full details of this model please read our [paper (pre-print)](https://arxiv.org/abs/2402.13253) and check our [GitHub](https://github.com/mbzuai-oryx/BiMediX). |
|
|
|
Check our preview at [π](https://youtu.be/kqfEdAcazIg)! |
|
|
|
|
|
## Getting Started |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_id = "BiMediX/BiMediX-Bi" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") |
|
|
|
text = "Hello BiMediX! I've been experiencing increased tiredness in the past week." |
|
inputs = tokenizer(text, return_tensors="pt").to('cuda') |
|
|
|
outputs = model.generate(**inputs, max_new_tokens=500) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
|
|
## Model Details |
|
|
|
|
|
The BiMediX model, built on a Mixture of Experts (MoE) architecture, leverages the [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) base model. |
|
It features a router network to allocate tasks to the most relevant experts, each being a specialized feedforward blocks within the model. |
|
This approach enables the model to scale significantly by utilizing a sparse operation method, where less than 13 billion parameters are active during inference, enhancing efficiency. |
|
The training utilized the BiMed1.3M dataset, focusing on bilingual medical interactions in both English and Arabic, with a substantial corpus of over 632 million healthcare-specialized tokens. |
|
The model's fine-tuning process includes a low-rank adaptation technique (QLoRA) to efficiently adapt the model to specific tasks while keeping computational demands manageable. |
|
|
|
|
|
<div style="width: 50%"> |
|
<table> |
|
<tr> |
|
<th>Model Name</th> |
|
<th>Link Download</th> |
|
</tr> |
|
<tr> |
|
<td>BiMediX-Bi</td> |
|
<td><a href="https://huggingface.co/BiMediX/BiMediX-Bi">HuggingFace</a></td> |
|
</tr> |
|
<tr> |
|
<td>BiMediX-Ara</td> |
|
<td><a href="https://huggingface.co/BiMediX/BiMediX-Ara">HuggingFace</a></td> |
|
</tr> |
|
<tr> |
|
<td>BiMediX-Eng</td> |
|
<td><a href="https://huggingface.co/BiMediX/BiMediX-Eng">HuggingFace</a></td> |
|
</tr> |
|
</table> |
|
</div> |
|
|
|
|
|
## Data |
|
|
|
1. **Compiling English Instruction Set**: The dataset creation began with compiling a dataset in English, covering three types of medical interactions: |
|
|
|
- **Multiple-choice question answering (MCQA)**, focusing on specialized medical knowledge. |
|
- **Open question answering (QA)**, including real-world consumer questions. |
|
- **MCQA-Grounded multi-turn chat conversations** for dynamic exchanges. |
|
|
|
2. **Semi-Automated Iterative Translation**: To create high-quality Arabic versions, a semi-automated translation pipeline with human alignment was used. |
|
4. **Bilingual Benchmark & Instruction Set Creation**: The English medical evaluation benchmarks were translated into Arabic. |
|
This created a high-quality Arabic medical benchmark, and combined with the original English benchmarks, formed a bilingual benchmark. |
|
The BiMed1.3M dataset, resulting from translating 444,995 English samples into Arabic and mixing Arabic and English in a 1:2 ratio, was then used for instruction tuning. |
|
|
|
## Benchmarks and Performance |
|
|
|
The BiMediX model was evaluated across several benchmarks, demonstrating its effectiveness in medical language understanding and question answering in both English and Arabic. |
|
|
|
1. **Medical Benchmarks Used for Evaluation:** |
|
- *PubMedQA*: A dataset for question answering from biomedical research papers, requiring reasoning over biomedical contexts. |
|
- *MedMCQA*: Multiple-choice questions from Indian medical entrance exams, covering a wide range of medical subjects. |
|
- *MedQA*: Questions from US and other medical board exams, testing specific knowledge and patient case understanding. |
|
- *Medical MMLU*: A compilation of questions from various medical subjects, requiring broad medical knowledge. |
|
|
|
2. **Results and Comparisons:** |
|
- **Bilingual Evaluation**: BiMediX showed superior performance in bilingual (Arabic-English) evaluations, outperforming both the Mixtral-8x7B base model and Jais-30B. It demonstrated more than 10 and 15 points higher average accuracy, respectively. |
|
- **Arabic Benchmark**: In Arabic-specific evaluations, BiMediX outperformed Jais-30B in all categories, highlighting the effectiveness of the BiMed1.3M dataset and bilingual training. |
|
- **English Benchmark**: BiMediX also excelled in English medical benchmarks, surpassing other state-of-the-art models like Med42-70B and Meditron-70B in terms of average performance and efficiency. |
|
|
|
These results underscore BiMediX's advanced capability in handling medical queries and its significant improvement over existing models in both languages, leveraging its unique bilingual dataset and training approach. |
|
|
|
## Limitations and Ethical Considerations |
|
|
|
**This release, intended for research, is not ready for clinical or commercial use.** Users are urged to employ BiMediX responsibly, especially when applying its outputs in real-world medical scenarios. |
|
It is imperative to verify the model's advice with qualified healthcare professionals and not to rely on AI for medical diagnoses or treatment decisions. |
|
Despite the overall advancements BiMediX brings to the field of medical NLP, it shares common challenges with other language models, |
|
including hallucinations, toxicity, and stereotypes. BiMediX's medical diagnoses and recommendations are not infallible. |
|
|
|
## License and Citation |
|
|
|
BiMediX is released under the CC-BY-NC-SA 4.0 License. |
|
For more details, please refer to the [LICENSE](https://huggingface.co/BiMediX/BiMediX-Bi/blob/main/LICENSE.txt) file included in this repository. |
|
|
|
If you use BiMediX in your research, please cite our work as follows: |
|
|
|
```bibtex |
|
@misc{pieri2024bimedix, |
|
title={BiMediX: Bilingual Medical Mixture of Experts LLM}, |
|
author={Sara Pieri and Sahal Shaji Mullappilly and Fahad Shahbaz Khan and Rao Muhammad Anwer and Salman Khan and Timothy Baldwin and Hisham Cholakkal}, |
|
year={2024}, |
|
eprint={2402.13253}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
|
|
``` |
|
|
|
Visit our [GitHub](https://github.com/mbzuai-oryx/BiMediX) for more information and resources. |