File size: 7,382 Bytes
5de3d93 44ec408 5de3d93 44ec408 e3a0d95 5de3d93 44ec408 9279f02 44ec408 638744c 44ec408 9279f02 44ec408 de85c51 44ec408 2b54d2e 2dd5233 44ec408 c78e0e3 44ec408 c78e0e3 f435d41 44ec408 c78e0e3 f435d41 44ec408 c78e0e3 da756bd 994eae0 44ec408 2d99402 2616669 2d99402 2616669 2d99402 66c7e95 2d99402 66c7e95 2d99402 638744c 44ec408 14fa7ab 638744c 14fa7ab 44ec408 994eae0 14fa7ab 994eae0 638744c 994eae0 44ec408 9279f02 44ec408 9279f02 44ec408 e3a0d95 f5e33e8 44ec408 1c1eca3 44ec408 9279f02 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
title: README
emoji: 🏢
colorFrom: indigo
colorTo: blue
sdk: static
pinned: true
license: cc-by-nc-sa-4.0
---
# BiMediX: Bilingual Medical Mixture of Experts LLM
Welcome to the official HuggingFace repository for BiMediX, the bilingual medical Large Language Model (LLM) designed for English and Arabic interactions. BiMediX facilitates a broad range of **medical interactions**, including multi-turn chats, multiple-choice Q&A, and open-ended question answering.
## Key Features
- **Bilingual Support**: Seamless interaction in both English and Arabic for a wide range of medical interactions.
- **BiMed1.3M Dataset**: Unique dataset with 1.3 million bilingual medical interactions across English and Arabic, including 250k synthesized multi-turn doctor-patient chats for instruction tuning.
- **High-Quality Translation** : Utilizes a semi-automated English-to-Arabic translation pipeline with human refinement to ensure accuracy and quality in translations.
- **Evaluation Benchmark for Arabic Medical LLMs**: Comprehensive benchmark for evaluating Arabic medical language models, setting a new standard in the field.
- **State-of-the-Art Performance**: Outperforms existing models in medical benchmarks, while 8-times faster than comparable existing models.
For full details of this model please read our [paper (pre-print)](https://arxiv.org/abs/2402.13253) and check our [GitHub](https://github.com/mbzuai-oryx/BiMediX).
Check our preview at [🔗](https://youtu.be/kqfEdAcazIg)!
## Getting Started
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "BiMediX/BiMediX-Bi"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
text = "Hello BiMediX! I've been experiencing increased tiredness in the past week."
inputs = tokenizer(text, return_tensors="pt").to('cuda')
outputs = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Model Details
The BiMediX model, built on a Mixture of Experts (MoE) architecture, leverages the [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) base model.
It features a router network to allocate tasks to the most relevant experts, each being a specialized feedforward blocks within the model.
This approach enables the model to scale significantly by utilizing a sparse operation method, where less than 13 billion parameters are active during inference, enhancing efficiency.
The training utilized the BiMed1.3M dataset, focusing on bilingual medical interactions in both English and Arabic, with a substantial corpus of over 632 million healthcare-specialized tokens.
The model's fine-tuning process includes a low-rank adaptation technique (QLoRA) to efficiently adapt the model to specific tasks while keeping computational demands manageable.
<div style="width: 50%">
<table>
<tr>
<th>Model Name</th>
<th>Link Download</th>
</tr>
<tr>
<td>BiMediX-Bi</td>
<td><a href="https://huggingface.co/BiMediX/BiMediX-Bi">HuggingFace</a></td>
</tr>
<tr>
<td>BiMediX-Ara</td>
<td><a href="https://huggingface.co/BiMediX/BiMediX-Ara">HuggingFace</a></td>
</tr>
<tr>
<td>BiMediX-Eng</td>
<td><a href="https://huggingface.co/BiMediX/BiMediX-Eng">HuggingFace</a></td>
</tr>
</table>
</div>
## Data
1. **Compiling English Instruction Set**: The dataset creation began with compiling a dataset in English, covering three types of medical interactions:
- **Multiple-choice question answering (MCQA)**, focusing on specialized medical knowledge.
- **Open question answering (QA)**, including real-world consumer questions.
- **MCQA-Grounded multi-turn chat conversations** for dynamic exchanges.
2. **Semi-Automated Iterative Translation**: To create high-quality Arabic versions, a semi-automated translation pipeline with human alignment was used.
4. **Bilingual Benchmark & Instruction Set Creation**: The English medical evaluation benchmarks were translated into Arabic.
This created a high-quality Arabic medical benchmark, and combined with the original English benchmarks, formed a bilingual benchmark.
The BiMed1.3M dataset, resulting from translating 444,995 English samples into Arabic and mixing Arabic and English in a 1:2 ratio, was then used for instruction tuning.
## Benchmarks and Performance
The BiMediX model was evaluated across several benchmarks, demonstrating its effectiveness in medical language understanding and question answering in both English and Arabic.
1. **Medical Benchmarks Used for Evaluation:**
- *PubMedQA*: A dataset for question answering from biomedical research papers, requiring reasoning over biomedical contexts.
- *MedMCQA*: Multiple-choice questions from Indian medical entrance exams, covering a wide range of medical subjects.
- *MedQA*: Questions from US and other medical board exams, testing specific knowledge and patient case understanding.
- *Medical MMLU*: A compilation of questions from various medical subjects, requiring broad medical knowledge.
2. **Results and Comparisons:**
- **Bilingual Evaluation**: BiMediX showed superior performance in bilingual (Arabic-English) evaluations, outperforming both the Mixtral-8x7B base model and Jais-30B. It demonstrated more than 10 and 15 points higher average accuracy, respectively.
- **Arabic Benchmark**: In Arabic-specific evaluations, BiMediX outperformed Jais-30B in all categories, highlighting the effectiveness of the BiMed1.3M dataset and bilingual training.
- **English Benchmark**: BiMediX also excelled in English medical benchmarks, surpassing other state-of-the-art models like Med42-70B and Meditron-70B in terms of average performance and efficiency.
These results underscore BiMediX's advanced capability in handling medical queries and its significant improvement over existing models in both languages, leveraging its unique bilingual dataset and training approach.
## Limitations and Ethical Considerations
**This release, intended for research, is not ready for clinical or commercial use.** Users are urged to employ BiMediX responsibly, especially when applying its outputs in real-world medical scenarios.
It is imperative to verify the model's advice with qualified healthcare professionals and not to rely on AI for medical diagnoses or treatment decisions.
Despite the overall advancements BiMediX brings to the field of medical NLP, it shares common challenges with other language models,
including hallucinations, toxicity, and stereotypes. BiMediX's medical diagnoses and recommendations are not infallible.
## License and Citation
BiMediX is released under the CC-BY-NC-SA 4.0 License.
For more details, please refer to the [LICENSE](https://huggingface.co/BiMediX/BiMediX-Bi/blob/main/LICENSE.txt) file included in this repository.
If you use BiMediX in your research, please cite our work as follows:
```bibtex
@misc{pieri2024bimedix,
title={BiMediX: Bilingual Medical Mixture of Experts LLM},
author={Sara Pieri and Sahal Shaji Mullappilly and Fahad Shahbaz Khan and Rao Muhammad Anwer and Salman Khan and Timothy Baldwin and Hisham Cholakkal},
year={2024},
eprint={2402.13253},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
Visit our [GitHub](https://github.com/mbzuai-oryx/BiMediX) for more information and resources. |