Text Generation
Transformers
Safetensors
English
falcon_mamba
Eval Results
Inference Endpoints
Gkunsch commited on
Commit
00003eb
·
verified ·
1 Parent(s): 39afd30

update readme

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -6,7 +6,7 @@ datasets:
6
  - tiiuae/falcon-refinedweb
7
  ---
8
 
9
- # Model Card for Sindibad-7B
10
 
11
 
12
 
@@ -29,7 +29,7 @@ datasets:
29
  - **Model type:** Causal decoder-only
30
  - **Architecture:** Mamba
31
  - **Language(s) (NLP):** Mainly English
32
- - **License:** TII Sindibad License 2.0
33
 
34
  ### Model Source
35
 
@@ -49,8 +49,8 @@ Find below some example scripts on how to use the model in `transformers` (Make
49
  ```python
50
  from transformers import AutoTokenizer, AutoModelForCausalLM
51
 
52
- tokenizer = AutoTokenizer.from_pretrained("tiiuae/sindibad-7b")
53
- model = AutoModelForCausalLM.from_pretrained("tiiuae/sindibad-7b")
54
 
55
  input_text = "Question: How many hours in one day? Answer: "
56
  input_ids = tokenizer(input_text, return_tensors="pt").input_ids
@@ -70,8 +70,8 @@ print(tokenizer.decode(outputs[0]))
70
  # pip install accelerate
71
  from transformers import AutoTokenizer, AutoModelForCausalLM
72
 
73
- tokenizer = AutoTokenizer.from_pretrained("tiiuae/sindibad-7b")
74
- model = AutoModelForCausalLM.from_pretrained("tiiuae/sindibad-7b", device_map="auto")
75
 
76
  input_text = "Question: How many hours in one day? Answer: "
77
  input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
@@ -94,8 +94,8 @@ print(tokenizer.decode(outputs[0]))
94
  import torch
95
  from transformers import AutoTokenizer, AutoModelForCausalLM
96
 
97
- tokenizer = AutoTokenizer.from_pretrained("tiiuae/sindibad-7b")
98
- model = AutoModelForCausalLM.from_pretrained("tiiuae/sindibad-7b", device_map="auto", torch_dtype=torch.float16)
99
 
100
  input_text = "Question: How many hours in one day? Answer: "
101
  input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
@@ -115,8 +115,8 @@ print(tokenizer.decode(outputs[0]))
115
  # pip install bitsandbytes accelerate
116
  from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
117
 
118
- tokenizer = AutoTokenizer.from_pretrained("tiiuae/sindibad-7b")
119
- model = AutoModelForCausalLM.from_pretrained("tiiuae/sindibad-7b", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True))
120
 
121
  input_text = "Question: How many hours in one day? Answer: "
122
  input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
@@ -143,7 +143,7 @@ Overall, the data sources included RefinedWeb-English, Refined-Multilingual (lat
143
  The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7B)/[11B](https://huggingface.co/tiiuae/falcon-11B) tokenizer.
144
 
145
  ## Training Procedure
146
- Sindibad-7B was trained on 256 H100 80GB GPUs for the majority of the training, using a 3D parallelism strategy (TP=1, PP=1, DP=256) combined with ZeRO.
147
 
148
  #### Training Hyperparameters
149
 
@@ -193,7 +193,7 @@ Refer to our technical report for more details about performance evaluation.
193
 
194
  ## Model Architecture and Objective
195
 
196
- Sindibad-7B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
197
 
198
  The model is based on the Mamba architecture ([Gu et al., 2023](https://arxiv.org/abs/2312.00752)).
199
 
@@ -209,11 +209,11 @@ The model is based on the Mamba architecture ([Gu et al., 2023](https://arxiv.or
209
 
210
  ### Hardware
211
 
212
- Sindibad-7B was trained on AWS SageMaker, using on average 256 H100 80GB GPUs in 32 p5 instances.
213
 
214
  ### Software
215
 
216
- Sindibad-7B was trained an internal distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO, high-performance Triton kernels.
217
 
218
  # Citation
219
 
 
6
  - tiiuae/falcon-refinedweb
7
  ---
8
 
9
+ # Model Card for Falcon-Mamba-7B
10
 
11
 
12
 
 
29
  - **Model type:** Causal decoder-only
30
  - **Architecture:** Mamba
31
  - **Language(s) (NLP):** Mainly English
32
+ - **License:** TII Falcon-Mamba License 2.0
33
 
34
  ### Model Source
35
 
 
49
  ```python
50
  from transformers import AutoTokenizer, AutoModelForCausalLM
51
 
52
+ tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
53
+ model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b")
54
 
55
  input_text = "Question: How many hours in one day? Answer: "
56
  input_ids = tokenizer(input_text, return_tensors="pt").input_ids
 
70
  # pip install accelerate
71
  from transformers import AutoTokenizer, AutoModelForCausalLM
72
 
73
+ tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
74
+ model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b", device_map="auto")
75
 
76
  input_text = "Question: How many hours in one day? Answer: "
77
  input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
 
94
  import torch
95
  from transformers import AutoTokenizer, AutoModelForCausalLM
96
 
97
+ tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
98
+ model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b", device_map="auto", torch_dtype=torch.float16)
99
 
100
  input_text = "Question: How many hours in one day? Answer: "
101
  input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
 
115
  # pip install bitsandbytes accelerate
116
  from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
117
 
118
+ tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
119
+ model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True))
120
 
121
  input_text = "Question: How many hours in one day? Answer: "
122
  input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
 
143
  The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7B)/[11B](https://huggingface.co/tiiuae/falcon-11B) tokenizer.
144
 
145
  ## Training Procedure
146
+ Falcon-Mamba-7B was trained on 256 H100 80GB GPUs for the majority of the training, using a 3D parallelism strategy (TP=1, PP=1, DP=256) combined with ZeRO.
147
 
148
  #### Training Hyperparameters
149
 
 
193
 
194
  ## Model Architecture and Objective
195
 
196
+ Falcon-Mamba-7B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
197
 
198
  The model is based on the Mamba architecture ([Gu et al., 2023](https://arxiv.org/abs/2312.00752)).
199
 
 
209
 
210
  ### Hardware
211
 
212
+ Falcon-Mamba-7B was trained on AWS SageMaker, using on average 256 H100 80GB GPUs in 32 p5 instances.
213
 
214
  ### Software
215
 
216
+ Falcon-Mamba-7B was trained an internal distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO, high-performance Triton kernels.
217
 
218
  # Citation
219