HuggingSara commited on
Commit
da756bd
·
verified ·
1 Parent(s): 66c7e95

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -43,7 +43,8 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
43
  ## Model Details
44
 
45
 
46
- The BiMediX model, built on a Mixture of Experts (MoE) architecture, leverages the [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) base model. It features a sophisticated router network to allocate tasks to the most relevant experts, each being a specialized feedforward blocks within the model.
 
47
  This approach enables the model to scale significantly by utilizing a sparse operation method, where less than 13 billion parameters are active during inference, enhancing efficiency.
48
  The training utilized the BiMed1.3M dataset, focusing on bilingual medical interactions in both English and Arabic, with a substantial corpus of over 632 million healthcare-specialized tokens.
49
  The model's fine-tuning process includes a low-rank adaptation technique (QLoRA) to efficiently adapt the model to specific tasks while keeping computational demands manageable.
 
43
  ## Model Details
44
 
45
 
46
+ The BiMediX model, built on a Mixture of Experts (MoE) architecture, leverages the [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) base model.
47
+ It features a router network to allocate tasks to the most relevant experts, each being a specialized feedforward blocks within the model.
48
  This approach enables the model to scale significantly by utilizing a sparse operation method, where less than 13 billion parameters are active during inference, enhancing efficiency.
49
  The training utilized the BiMed1.3M dataset, focusing on bilingual medical interactions in both English and Arabic, with a substantial corpus of over 632 million healthcare-specialized tokens.
50
  The model's fine-tuning process includes a low-rank adaptation technique (QLoRA) to efficiently adapt the model to specific tasks while keeping computational demands manageable.