Update README.md
Browse files
README.md
CHANGED
@@ -73,17 +73,26 @@ The model's fine-tuning process includes a low-rank adaptation technique (QLoRA)
|
|
73 |
|
74 |
## Dataset
|
75 |
|
76 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
## Benchmarks and Performance
|
79 |
|
80 |
The BiMediX model was evaluated across several benchmarks, demonstrating its effectiveness in medical language understanding and question answering in both English and Arabic.
|
81 |
|
82 |
1. **Medical Benchmarks Used for Evaluation:**
|
83 |
-
-
|
84 |
-
-
|
85 |
-
-
|
86 |
-
-
|
87 |
|
88 |
2. **Results and Comparisons:**
|
89 |
- **Bilingual Evaluation**: BiMediX showed superior performance in bilingual (Arabic-English) evaluations, outperforming both the Mixtral-8x7B base model and Jais-30B, a model designed for Arabic. It demonstrated more than 10 and 15 points higher average accuracy, respectively.
|
|
|
73 |
|
74 |
## Dataset
|
75 |
|
76 |
+
1. **Compiling English Instruction Set**: The dataset creation began with compiling a dataset in English, covering three types of medical interactions:
|
77 |
+
|
78 |
+
- **Multiple-choice question answering (MCQA)**, focusing on specialized medical knowledge.
|
79 |
+
- **Open question answering (QA)**, including real-world consumer questions.
|
80 |
+
- **MCQA-Grounded multi-turn chat conversations** for dynamic exchanges.
|
81 |
+
|
82 |
+
2. **Semi-Automated Iterative Translation**: To create high-quality Arabic versions, a semi-automated translation pipeline with human alignment was used.
|
83 |
+
4. **Bilingual Benchmark & Instruction Set Creation**: The English medical evaluation benchmarks were translated into Arabic.
|
84 |
+
This created a high-quality Arabic medical benchmark, which, combined with the original English benchmarks, formed a bilingual benchmark.
|
85 |
+
The BiMed1.3M dataset, resulting from translating 444,995 English samples into Arabic and mixing Arabic and English in a 1:2 ratio, was then used for instruction tuning.
|
86 |
|
87 |
## Benchmarks and Performance
|
88 |
|
89 |
The BiMediX model was evaluated across several benchmarks, demonstrating its effectiveness in medical language understanding and question answering in both English and Arabic.
|
90 |
|
91 |
1. **Medical Benchmarks Used for Evaluation:**
|
92 |
+
- *PubMedQA*: A dataset for question answering from biomedical research papers, requiring reasoning over biomedical contexts.
|
93 |
+
- *MedMCQA*: Multiple-choice questions from Indian medical entrance exams, covering a wide range of medical subjects.
|
94 |
+
- *MedQA*: Questions from US and other medical board exams, testing specific knowledge and patient case understanding.
|
95 |
+
- *Medical MMLU*: A compilation of questions from various medical subjects, requiring broad medical knowledge.
|
96 |
|
97 |
2. **Results and Comparisons:**
|
98 |
- **Bilingual Evaluation**: BiMediX showed superior performance in bilingual (Arabic-English) evaluations, outperforming both the Mixtral-8x7B base model and Jais-30B, a model designed for Arabic. It demonstrated more than 10 and 15 points higher average accuracy, respectively.
|