HuggingSara commited on
Commit
14fa7ab
·
verified ·
1 Parent(s): 2616669

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -5
README.md CHANGED
@@ -73,17 +73,26 @@ The model's fine-tuning process includes a low-rank adaptation technique (QLoRA)
73
 
74
  ## Dataset
75
 
76
- (Details about the BiMed1.3M dataset, including composition and access.)
 
 
 
 
 
 
 
 
 
77
 
78
  ## Benchmarks and Performance
79
 
80
  The BiMediX model was evaluated across several benchmarks, demonstrating its effectiveness in medical language understanding and question answering in both English and Arabic.
81
 
82
  1. **Medical Benchmarks Used for Evaluation:**
83
- - **PubMedQA**: A dataset for question answering from biomedical research papers, requiring reasoning over biomedical contexts.
84
- - **MedMCQA**: Multiple-choice questions from Indian medical entrance exams, covering a wide range of medical subjects.
85
- - **MedQA**: Questions from US and other medical board exams, testing specific knowledge and patient case understanding.
86
- - **Medical MMLU**: A compilation of questions from various medical subjects, requiring broad medical knowledge.
87
 
88
  2. **Results and Comparisons:**
89
  - **Bilingual Evaluation**: BiMediX showed superior performance in bilingual (Arabic-English) evaluations, outperforming both the Mixtral-8x7B base model and Jais-30B, a model designed for Arabic. It demonstrated more than 10 and 15 points higher average accuracy, respectively.
 
73
 
74
  ## Dataset
75
 
76
+ 1. **Compiling English Instruction Set**: The dataset creation began with compiling a dataset in English, covering three types of medical interactions:
77
+
78
+ - **Multiple-choice question answering (MCQA)**, focusing on specialized medical knowledge.
79
+ - **Open question answering (QA)**, including real-world consumer questions.
80
+ - **MCQA-Grounded multi-turn chat conversations** for dynamic exchanges.
81
+
82
+ 2. **Semi-Automated Iterative Translation**: To create high-quality Arabic versions, a semi-automated translation pipeline with human alignment was used.
83
+ 4. **Bilingual Benchmark & Instruction Set Creation**: The English medical evaluation benchmarks were translated into Arabic.
84
+ This created a high-quality Arabic medical benchmark, which, combined with the original English benchmarks, formed a bilingual benchmark.
85
+ The BiMed1.3M dataset, resulting from translating 444,995 English samples into Arabic and mixing Arabic and English in a 1:2 ratio, was then used for instruction tuning.
86
 
87
  ## Benchmarks and Performance
88
 
89
  The BiMediX model was evaluated across several benchmarks, demonstrating its effectiveness in medical language understanding and question answering in both English and Arabic.
90
 
91
  1. **Medical Benchmarks Used for Evaluation:**
92
+ - *PubMedQA*: A dataset for question answering from biomedical research papers, requiring reasoning over biomedical contexts.
93
+ - *MedMCQA*: Multiple-choice questions from Indian medical entrance exams, covering a wide range of medical subjects.
94
+ - *MedQA*: Questions from US and other medical board exams, testing specific knowledge and patient case understanding.
95
+ - *Medical MMLU*: A compilation of questions from various medical subjects, requiring broad medical knowledge.
96
 
97
  2. **Results and Comparisons:**
98
  - **Bilingual Evaluation**: BiMediX showed superior performance in bilingual (Arabic-English) evaluations, outperforming both the Mixtral-8x7B base model and Jais-30B, a model designed for Arabic. It demonstrated more than 10 and 15 points higher average accuracy, respectively.