Update README.md
Browse files
README.md
CHANGED
@@ -43,7 +43,8 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
43 |
## Model Details
|
44 |
|
45 |
|
46 |
-
The BiMediX model, built on a Mixture of Experts (MoE) architecture, leverages the [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) base model.
|
|
|
47 |
This approach enables the model to scale significantly by utilizing a sparse operation method, where less than 13 billion parameters are active during inference, enhancing efficiency.
|
48 |
The training utilized the BiMed1.3M dataset, focusing on bilingual medical interactions in both English and Arabic, with a substantial corpus of over 632 million healthcare-specialized tokens.
|
49 |
The model's fine-tuning process includes a low-rank adaptation technique (QLoRA) to efficiently adapt the model to specific tasks while keeping computational demands manageable.
|
|
|
43 |
## Model Details
|
44 |
|
45 |
|
46 |
+
The BiMediX model, built on a Mixture of Experts (MoE) architecture, leverages the [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) base model.
|
47 |
+
It features a router network to allocate tasks to the most relevant experts, each being a specialized feedforward blocks within the model.
|
48 |
This approach enables the model to scale significantly by utilizing a sparse operation method, where less than 13 billion parameters are active during inference, enhancing efficiency.
|
49 |
The training utilized the BiMed1.3M dataset, focusing on bilingual medical interactions in both English and Arabic, with a substantial corpus of over 632 million healthcare-specialized tokens.
|
50 |
The model's fine-tuning process includes a low-rank adaptation technique (QLoRA) to efficiently adapt the model to specific tasks while keeping computational demands manageable.
|