Spaces:

BiMediX
/

README

Running

App Files Files Community

HuggingSara commited on Feb 20, 2024

Commit

14fa7ab

verified ·

1 Parent(s): 2616669

Update README.md

Browse files

Files changed (1) hide show

README.md +14 -5

README.md CHANGED Viewed

@@ -73,17 +73,26 @@ The model's fine-tuning process includes a low-rank adaptation technique (QLoRA)
 ## Dataset
-(Details about the BiMed1.3M dataset, including composition and access.)
 ## Benchmarks and Performance
 The BiMediX model was evaluated across several benchmarks, demonstrating its effectiveness in medical language understanding and question answering in both English and Arabic.
 1. **Medical Benchmarks Used for Evaluation:**
-   - **PubMedQA**: A dataset for question answering from biomedical research papers, requiring reasoning over biomedical contexts.
-   - **MedMCQA**: Multiple-choice questions from Indian medical entrance exams, covering a wide range of medical subjects.
-   - **MedQA**: Questions from US and other medical board exams, testing specific knowledge and patient case understanding.
-   - **Medical MMLU**: A compilation of questions from various medical subjects, requiring broad medical knowledge.
 2. **Results and Comparisons:**
    - **Bilingual Evaluation**: BiMediX showed superior performance in bilingual (Arabic-English) evaluations, outperforming both the Mixtral-8x7B base model and Jais-30B, a model designed for Arabic. It demonstrated more than 10 and 15 points higher average accuracy, respectively.

 ## Dataset
+1. **Compiling English Instruction Set**: The dataset creation began with compiling a dataset in English, covering three types of medical interactions:
+- **Multiple-choice question answering (MCQA)**, focusing on specialized medical knowledge.
+- **Open question answering (QA)**, including real-world consumer questions.
+- **MCQA-Grounded multi-turn chat conversations** for dynamic exchanges.
+2. **Semi-Automated Iterative Translation**: To create high-quality Arabic versions, a semi-automated translation pipeline with human alignment was used.
+4. **Bilingual Benchmark & Instruction Set Creation**: The English medical evaluation benchmarks were translated into Arabic.
+This created a high-quality Arabic medical benchmark, which, combined with the original English benchmarks, formed a bilingual benchmark.
+The BiMed1.3M dataset, resulting from translating 444,995 English samples into Arabic and mixing Arabic and English in a 1:2 ratio, was then used for instruction tuning.
 ## Benchmarks and Performance
 The BiMediX model was evaluated across several benchmarks, demonstrating its effectiveness in medical language understanding and question answering in both English and Arabic.
 1. **Medical Benchmarks Used for Evaluation:**
+   - *PubMedQA*: A dataset for question answering from biomedical research papers, requiring reasoning over biomedical contexts.
+   - *MedMCQA*: Multiple-choice questions from Indian medical entrance exams, covering a wide range of medical subjects.
+   - *MedQA*: Questions from US and other medical board exams, testing specific knowledge and patient case understanding.
+   - *Medical MMLU*: A compilation of questions from various medical subjects, requiring broad medical knowledge.
 2. **Results and Comparisons:**
    - **Bilingual Evaluation**: BiMediX showed superior performance in bilingual (Arabic-English) evaluations, outperforming both the Mixtral-8x7B base model and Jais-30B, a model designed for Arabic. It demonstrated more than 10 and 15 points higher average accuracy, respectively.