HuggingSara commited on
Commit
c78e0e3
·
verified ·
1 Parent(s): 9279f02

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -20,18 +20,19 @@ Welcome to the official HuggingFace repository for BiMediX, the bilingual medica
20
  - **Evaluation Benchmark for Arabic Medical LLMs**: Comprehensive benchmark for evaluating Arabic medical language models, setting a new standard in the field.
21
  - **State-of-the-Art Performance**: Outperforms existing models in medical benchmarks, while 8-times faster than comparable existing models.
22
 
 
23
 
24
  ## Getting Started
25
 
26
  ```python
27
  from transformers import AutoModelForCausalLM, AutoTokenizer
28
 
29
- model_id = "TODO"
30
- tokenizer = AutoTokenizer.from_pretrained(model_id)
31
 
 
32
  model = AutoModelForCausalLM.from_pretrained(model_id)
33
 
34
- text = "TODO"
35
  inputs = tokenizer(text, return_tensors="pt")
36
 
37
  outputs = model.generate(**inputs, max_new_tokens=500)
@@ -41,7 +42,8 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
41
 
42
  ## Model Details
43
 
44
- (Describe the model's architecture, focusing on its mixture of experts design.)
 
45
 
46
  ## Dataset
47
 
 
20
  - **Evaluation Benchmark for Arabic Medical LLMs**: Comprehensive benchmark for evaluating Arabic medical language models, setting a new standard in the field.
21
  - **State-of-the-Art Performance**: Outperforms existing models in medical benchmarks, while 8-times faster than comparable existing models.
22
 
23
+ For full details of this model please read our [paper (pre-print)](#).
24
 
25
  ## Getting Started
26
 
27
  ```python
28
  from transformers import AutoModelForCausalLM, AutoTokenizer
29
 
30
+ model_id = "BiMediX/BiMediX-Bi"
 
31
 
32
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
33
  model = AutoModelForCausalLM.from_pretrained(model_id)
34
 
35
+ text = "Hello BiMediX! I've been experiencing increased tiredness in the past week."
36
  inputs = tokenizer(text, return_tensors="pt")
37
 
38
  outputs = model.generate(**inputs, max_new_tokens=500)
 
42
 
43
  ## Model Details
44
 
45
+
46
+ The BiMediX model, built on a Mixture of Experts (MoE) architecture, leverages the Mixtral-8x7B base network. This approach enables the model to scale significantly by utilizing a sparse operation method, where only a subset of its 47 billion parameters are active during inference, enhancing efficiency. It features a sophisticated router network to allocate tasks to the most relevant experts, each being a specialized feedforward block within the model. The training utilized the BiMed1.3M dataset, focusing on bilingual medical interactions in both English and Arabic, with a substantial corpus of over 632 million healthcare-specialized tokens. The model's fine-tuning process includes a low-rank adaptation technique (QLoRA) to efficiently adapt the model to specific tasks while keeping computational demands manageable.
47
 
48
  ## Dataset
49