mav23 commited on
Commit
fe34722
1 Parent(s): 1feee81

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +651 -0
  3. silma-9b-instruct-v1.0.Q4_0.gguf +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ silma-9b-instruct-v1.0.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,651 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ extra_gated_button_content: Acknowledge license
6
+ tags:
7
+ - conversational
8
+ language:
9
+ - ar
10
+ - en
11
+ model-index:
12
+ - name: SILMA-9B-Instruct-v1.0
13
+ results:
14
+ - task:
15
+ type: text-generation
16
+ dataset:
17
+ name: MMLU (Arabic)
18
+ type: OALL/Arabic_MMLU
19
+ metrics:
20
+ - name: acc_norm
21
+ type: loglikelihood_acc_norm
22
+ value: 52.55
23
+ source:
24
+ name: Open Arabic LLM Leaderboard
25
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ dataset:
29
+ name: AlGhafa
30
+ type: OALL/AlGhafa-Arabic-LLM-Benchmark-Native
31
+ metrics:
32
+ - name: acc_norm
33
+ type: loglikelihood_acc_norm
34
+ value: 71.85
35
+ source:
36
+ name: Open Arabic LLM Leaderboard
37
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ dataset:
41
+ name: ARC Challenge (Arabic)
42
+ type: OALL/AlGhafa-Arabic-LLM-Benchmark-Translated
43
+ metrics:
44
+ - name: acc_norm
45
+ type: loglikelihood_acc_norm
46
+ value: 78.19
47
+ source:
48
+ name: Open Arabic LLM Leaderboard
49
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
50
+ - task:
51
+ type: text-generation
52
+ dataset:
53
+ name: ACVA
54
+ type: OALL/ACVA
55
+ metrics:
56
+ - name: acc_norm
57
+ type: loglikelihood_acc_norm
58
+ value: 78.89
59
+ source:
60
+ name: Open Arabic LLM Leaderboard
61
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ dataset:
65
+ name: Arabic_EXAMS
66
+ type: OALL/Arabic_EXAMS
67
+ metrics:
68
+ - name: acc_norm
69
+ type: loglikelihood_acc_norm
70
+ value: 51.4
71
+ source:
72
+ name: Open Arabic LLM Leaderboard
73
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ dataset:
77
+ name: ARC Easy
78
+ type: OALL/AlGhafa-Arabic-LLM-Benchmark-Translated
79
+ metrics:
80
+ - name: acc_norm
81
+ type: loglikelihood_acc_norm
82
+ value: 86
83
+ source:
84
+ name: Open Arabic LLM Leaderboard
85
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ dataset:
89
+ name: BOOLQ (Arabic)
90
+ type: OALL/AlGhafa-Arabic-LLM-Benchmark-Translated
91
+ metrics:
92
+ - name: acc_norm
93
+ type: loglikelihood_acc_norm
94
+ value: 64.05
95
+ source:
96
+ name: Open Arabic LLM Leaderboard
97
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ dataset:
101
+ name: COPA (Arabic)
102
+ type: OALL/AlGhafa-Arabic-LLM-Benchmark-Translated
103
+ metrics:
104
+ - name: acc_norm
105
+ type: loglikelihood_acc_norm
106
+ value: 78.89
107
+ source:
108
+ name: Open Arabic LLM Leaderboard
109
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
110
+ - task:
111
+ type: text-generation
112
+ dataset:
113
+ name: HELLASWAG (Arabic)
114
+ type: OALL/AlGhafa-Arabic-LLM-Benchmark-Translated
115
+ metrics:
116
+ - name: acc_norm
117
+ type: loglikelihood_acc_norm
118
+ value: 47.64
119
+ source:
120
+ name: Open Arabic LLM Leaderboard
121
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ dataset:
125
+ name: OPENBOOK QA (Arabic)
126
+ type: OALL/AlGhafa-Arabic-LLM-Benchmark-Translated
127
+ metrics:
128
+ - name: acc_norm
129
+ type: loglikelihood_acc_norm
130
+ value: 72.93
131
+ source:
132
+ name: Open Arabic LLM Leaderboard
133
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
134
+ - task:
135
+ type: text-generation
136
+ dataset:
137
+ name: PIQA (Arabic)
138
+ type: OALL/AlGhafa-Arabic-LLM-Benchmark-Translated
139
+ metrics:
140
+ - name: acc_norm
141
+ type: loglikelihood_acc_norm
142
+ value: 71.96
143
+ source:
144
+ name: Open Arabic LLM Leaderboard
145
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
146
+ - task:
147
+ type: text-generation
148
+ dataset:
149
+ name: RACE (Arabic)
150
+ type: OALL/AlGhafa-Arabic-LLM-Benchmark-Translated
151
+ metrics:
152
+ - name: acc_norm
153
+ type: loglikelihood_acc_norm
154
+ value: 75.55
155
+ source:
156
+ name: Open Arabic LLM Leaderboard
157
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
158
+ - task:
159
+ type: text-generation
160
+ dataset:
161
+ name: SCIQ (Arabic)
162
+ type: OALL/AlGhafa-Arabic-LLM-Benchmark-Translated
163
+ metrics:
164
+ - name: acc_norm
165
+ type: loglikelihood_acc_norm
166
+ value: 91.26
167
+ source:
168
+ name: Open Arabic LLM Leaderboard
169
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
170
+ - task:
171
+ type: text-generation
172
+ dataset:
173
+ name: TOXIGEN (Arabic)
174
+ type: OALL/AlGhafa-Arabic-LLM-Benchmark-Translated
175
+ metrics:
176
+ - name: acc_norm
177
+ type: loglikelihood_acc_norm
178
+ value: 67.59
179
+ source:
180
+ name: Open Arabic LLM Leaderboard
181
+ url: https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
182
+
183
+
184
+ ---
185
+
186
+
187
+ # SILMA AI
188
+
189
+ SILMA.AI is a leading Generative AI startup dedicated to empowering Arabic speakers with state-of-the-art AI solutions.
190
+
191
+
192
+ ## 🚀 Our Flagship Model: SILMA 1.0 🚀
193
+
194
+ * **SILMA 1.0** is the **TOP-RANKED** open-weights Arabic LLM with an impressive **9 billion parameter size**, surpassing models that are over seven times larger 🏆
195
+
196
+
197
+ ## What makes SILMA exceptional?
198
+
199
+ * SIMLA is a small language model outperforming 72B models in most arabic language tasks, thus more practical for business use-cases
200
+ * SILMA is built over the robust foundational models of Google Gemma, combining the strengths of both to provide you with unparalleled performance
201
+ * SILMA is an open-weight model, free to use in accordance with our open license
202
+
203
+
204
+ ## 👥 Our Team
205
+
206
+ We are a team of seasoned **Arabic AI experts** who understand the nuances of the language and cultural considerations, enabling us to build solutions that truly resonate with Arabic users.
207
+
208
+ **Authors**: [silma.ai](https://silma.ai)
209
+
210
+
211
+ ### Usage
212
+
213
+ Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
214
+
215
+ ```sh
216
+ pip install -U transformers sentencepiece
217
+ ```
218
+
219
+ Then, copy the snippet from the section that is relevant for your usecase.
220
+
221
+ #### Running with the `pipeline` API
222
+
223
+ ```python
224
+ import torch
225
+ from transformers import pipeline
226
+
227
+ pipe = pipeline(
228
+ "text-generation",
229
+ model="silma-ai/SILMA-9B-Instruct-v1.0",
230
+ model_kwargs={"torch_dtype": torch.bfloat16},
231
+ device="cuda", # replace with "mps" to run on a Mac device
232
+ )
233
+
234
+ messages = [
235
+ {"role": "user", "content": "اكتب رسالة تعتذر فيها لمديري في العمل عن الحضور اليوم لأسباب مرضية."},
236
+ ]
237
+
238
+ outputs = pipe(messages, max_new_tokens=256)
239
+ assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
240
+ print(assistant_response)
241
+ ```
242
+
243
+ - Response:
244
+
245
+ ```text
246
+ السلام عليكم ورحمة الله وبركاته
247
+
248
+ أودّ أن أعتذر عن عدم الحضور إلى العمل اليوم بسبب مرضي. أشعر بالسوء الشديد وأحتاج إلى الراحة. سأعود إلى العمل فور تعافيي.
249
+ شكراً لتفهمكم.
250
+
251
+ مع تحياتي،
252
+ [اسمك]
253
+ ```
254
+
255
+ #### Running the model on a single / multi GPU
256
+
257
+ ```sh
258
+ pip install accelerate
259
+ ```
260
+
261
+ ```python
262
+ from transformers import AutoTokenizer, AutoModelForCausalLM
263
+ import torch
264
+
265
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
266
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
267
+ model = AutoModelForCausalLM.from_pretrained(
268
+ model_id,
269
+ device_map="auto",
270
+ torch_dtype=torch.bfloat16,
271
+ )
272
+
273
+ messages = [
274
+ {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
275
+ {"role": "user", "content": "أيهما أبعد عن الأرض, الشمس أم القمر؟"},
276
+ ]
277
+
278
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
279
+
280
+ outputs = model.generate(**input_ids, max_new_tokens=256)
281
+
282
+ print(tokenizer.decode(outputs[0]))
283
+ ```
284
+
285
+ - Response:
286
+ ```text
287
+ الشمس
288
+ ```
289
+
290
+ You can ensure the correct chat template is applied by using `tokenizer.apply_chat_template` as follows:
291
+ ```python
292
+
293
+ from transformers import AutoTokenizer, AutoModelForCausalLM
294
+ import torch
295
+
296
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
297
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
298
+ model = AutoModelForCausalLM.from_pretrained(
299
+ model_id,
300
+ device_map="auto",
301
+ torch_dtype=torch.bfloat16,
302
+ )
303
+
304
+ messages = [
305
+ {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
306
+ {"role": "user", "content": "اكتب كود بايثون لتوليد متسلسلة أرقام زوجية."},
307
+ ]
308
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
309
+
310
+ outputs = model.generate(**input_ids, max_new_tokens=256)
311
+ print(tokenizer.decode(outputs[0]).split("<start_of_turn>model")[-1])
312
+ ```
313
+
314
+ - Response:
315
+ ```python
316
+ def generate_even_numbers(n):
317
+ """
318
+ This function generates a list of even numbers from 1 to n.
319
+ Args:
320
+ n: The upper limit of the range.
321
+
322
+ Returns:
323
+ A list of even numbers.
324
+ """
325
+ return [i for i in range(1, n + 1) if i % 2 == 0]
326
+
327
+ # Example usage
328
+ n = 10
329
+ even_numbers = generate_even_numbers(n)
330
+ print(f"The first {n} even numbers are: {even_numbers}")
331
+ ```
332
+
333
+ #### Quantized Versions through `bitsandbytes`
334
+
335
+ <details>
336
+ <summary>
337
+ Using 8-bit precision (int8)
338
+ </summary>
339
+
340
+ ```sh
341
+ pip install bitsandbytes accelerate
342
+ ```
343
+
344
+ ```python
345
+ # pip install bitsandbytes accelerate
346
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
347
+
348
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
349
+ quantization_config = BitsAndBytesConfig(load_in_8bit=True)
350
+
351
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
352
+ model = AutoModelForCausalLM.from_pretrained(
353
+ model_id,
354
+ quantization_config=quantization_config,
355
+ )
356
+
357
+ messages = [
358
+ {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
359
+ {"role": "user", "content": "اذكر خمس انواع فواكه بها نسب عالية من فيتامين ج."},
360
+ ]
361
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
362
+
363
+ outputs = model.generate(**input_ids, max_new_tokens=256)
364
+ print(tokenizer.decode(outputs[0]).split("<start_of_turn>model")[-1])
365
+ ```
366
+
367
+ - Response:
368
+ ```text
369
+ الليمون، البرتقال، الموز، الكيوي، الفراولة
370
+ ```
371
+
372
+ </details>
373
+
374
+ <details>
375
+ <summary>
376
+ Using 4-bit precision
377
+ </summary>
378
+
379
+ ```python
380
+ # pip install bitsandbytes accelerate
381
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
382
+
383
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
384
+ quantization_config = BitsAndBytesConfig(load_in_4bit=True)
385
+
386
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
387
+ model = AutoModelForCausalLM.from_pretrained(
388
+ model_id,
389
+ quantization_config=quantization_config,
390
+ )
391
+
392
+ messages = [
393
+ {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
394
+ {"role": "user", "content": "في أي عام توفى صلاح الدين الأيوبي؟"},
395
+ ]
396
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
397
+
398
+ outputs = model.generate(**input_ids, max_new_tokens=256)
399
+ print(tokenizer.decode(outputs[0]).split("<start_of_turn>model")[-1])
400
+ ```
401
+
402
+ - Response:
403
+ ```text
404
+ 1193
405
+ ```
406
+
407
+ </details>
408
+
409
+ #### Advanced Usage
410
+
411
+ <details>
412
+ <summary>
413
+ Torch compile
414
+ </summary>
415
+
416
+ [Torch compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) is a method for speeding-up the
417
+ inference of PyTorch modules. The Silma model can be run up to 6x faster by leveraging torch compile.
418
+
419
+ Note that two warm-up steps are required before the full inference speed is realised:
420
+
421
+ ```python
422
+ import os
423
+ os.environ["TOKENIZERS_PARALLELISM"] = "false"
424
+
425
+ from transformers import AutoTokenizer, Gemma2ForCausalLM
426
+ from transformers.cache_utils import HybridCache
427
+ import torch
428
+
429
+ torch.set_float32_matmul_precision("high")
430
+
431
+ # load the model + tokenizer
432
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
433
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
434
+ model = Gemma2ForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
435
+ model.to("cuda")
436
+
437
+ # apply the torch compile transformation
438
+ model.forward = torch.compile(model.forward, mode="reduce-overhead", fullgraph=True)
439
+
440
+ # pre-process inputs
441
+
442
+ messages = [
443
+ {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
444
+ {"role": "user", "content": "من الرئيس الذي تولى المنصب في أمريكا بعد دونالد ترامب؟"},
445
+ ]
446
+ model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
447
+
448
+ input_text = "من الرئيس الذي تولى المنصب في أمريكا بعد دونالد ترامب؟"
449
+ model_inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
450
+ prompt_length = model_inputs.input_ids.shape[1]
451
+
452
+ # set-up k/v cache
453
+ past_key_values = HybridCache(
454
+ config=model.config,
455
+ max_batch_size=1,
456
+ max_cache_len=model.config.max_position_embeddings,
457
+ device=model.device,
458
+ dtype=model.dtype
459
+ )
460
+
461
+ # enable passing kv cache to generate
462
+ model._supports_cache_class = True
463
+ model.generation_config.cache_implementation = None
464
+
465
+ # two warm-up steps
466
+ for idx in range(2):
467
+ outputs = model.generate(**model_inputs, past_key_values=past_key_values, do_sample=True, temperature=1.0, max_new_tokens=128)
468
+ past_key_values.reset()
469
+
470
+ # fast run
471
+ outputs = model.generate(**model_inputs, past_key_values=past_key_values, do_sample=True, temperature=1.0, max_new_tokens=128)
472
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
473
+ ```
474
+
475
+ - Response:
476
+ ```text
477
+ جو بايدن
478
+ ```
479
+
480
+ For more details, refer to the [Transformers documentation](https://huggingface.co/docs/transformers/main/en/llm_optims?static-kv=basic+usage%3A+generation_config).
481
+
482
+ </details>
483
+
484
+ ### Chat Template
485
+
486
+ The instruction-tuned models use a chat template that must be adhered to for conversational use.
487
+ The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
488
+
489
+ Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
490
+
491
+ ```python
492
+ from transformers import AutoTokenizer, AutoModelForCausalLM
493
+ import transformers
494
+ import torch
495
+
496
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
497
+ dtype = torch.bfloat16
498
+
499
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
500
+ model = AutoModelForCausalLM.from_pretrained(
501
+ model_id,
502
+ device_map="cuda",
503
+ torch_dtype=dtype,)
504
+
505
+ chat = [
506
+ { "role": "user", "content": "ما اشهر اطارات العمل في البايثون لبناء نماذج الذكاء الاصطناعي؟" },
507
+ ]
508
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
509
+ ```
510
+
511
+ At this point, the prompt contains the following text:
512
+
513
+ ```
514
+ <bos><start_of_turn>user
515
+ ما اشهر اطارات العمل في البايثون لبناء نماذج الذكاء الاصطناعي؟<end_of_turn>
516
+ <start_of_turn>model
517
+ ```
518
+
519
+ As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
520
+ (either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
521
+ the `<end_of_turn>` token.
522
+
523
+ You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
524
+ chat template.
525
+
526
+ After the prompt is ready, generation can be performed like this:
527
+
528
+ ```python
529
+ inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
530
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
531
+ print(tokenizer.decode(outputs[0]))
532
+ ```
533
+
534
+ ### Inputs and outputs
535
+
536
+ * **Input:** Text string, such as a question, a prompt, or a document to be
537
+ summarized.
538
+ * **Output:** Generated Arabic or English text in response to the input, such
539
+ as an answer to a question, or a summary of a document.
540
+
541
+
542
+ ### GPU Requirements
543
+
544
+ The following are the minimum/recommended GPU requirements for running inference:
545
+
546
+ * Recommended
547
+ * At least one GPU with a minimum of 48 GB of GPU memory
548
+ * Examples: Nvidia A40, L40, RTX A6000
549
+
550
+ * Minimum
551
+
552
+ * At least one GPU with 16-24 GB of GPU memory
553
+ * Examples: Nvidia RTX 4090, RTX 4000, L4
554
+ * Assuming that the model is loaded in either 8-bit or 4-bit [Quantization mode](https://huggingface.co/silma-ai/SILMA-9B-Instruct-v1.0#quantized-versions-through-bitsandbytes)
555
+
556
+
557
+ ### Citation
558
+
559
+ ```none
560
+ @article{silma_01_2024,
561
+ title={Silma},
562
+ url={https://www.silma.ai},
563
+ publisher={Silma},
564
+ author={Silma Team},
565
+ year={2024}
566
+ }
567
+ ```
568
+
569
+ ## Usage and Limitations
570
+
571
+ These models have certain limitations that users should be aware of.
572
+
573
+ ### Intended Usage
574
+
575
+ Open Large Language Models (LLMs) have a wide range of applications across
576
+ various industries and domains. The following list of potential uses is not
577
+ comprehensive. The purpose of this list is to provide contextual information
578
+ about the possible use-cases that the model creators considered as part of model
579
+ training and development.
580
+
581
+ * Content Creation and Communication
582
+ * Text Generation: These models can be used to generate creative text formats
583
+ such as poems, scripts, code, marketing copy, and email drafts.
584
+ * Chatbots and Conversational AI: Power conversational interfaces for customer
585
+ service, virtual assistants, or interactive applications.
586
+ * Text Summarization: Generate concise summaries of a text corpus, research
587
+ papers, or reports.
588
+ * Research and Education
589
+ * Natural Language Processing (NLP) Research: These models can serve as a
590
+ foundation for researchers to experiment with NLP techniques, develop
591
+ algorithms, and contribute to the advancement of the field.
592
+ * Language Learning Tools: Support interactive language learning experiences,
593
+ aiding in grammar correction or providing writing practice.
594
+ * Knowledge Exploration: Assist researchers in exploring large bodies of text
595
+ by generating summaries or answering questions about specific topics.
596
+
597
+ ### Limitations
598
+
599
+ * Training Data
600
+ * The quality and diversity of the training data significantly influence the
601
+ model's capabilities. Biases or gaps in the training data can lead to
602
+ limitations in the model's responses.
603
+ * The scope of the training dataset determines the subject areas the model can
604
+ handle effectively.
605
+ * Context and Task Complexity
606
+ * LLMs are better at tasks that can be framed with clear prompts and
607
+ instructions. Open-ended or highly complex tasks might be challenging.
608
+ * A model's performance can be influenced by the amount of context provided
609
+ (longer context generally leads to better outputs, up to a certain point).
610
+ * Language Ambiguity and Nuance
611
+ * Natural language is inherently complex. LLMs might struggle to grasp subtle
612
+ nuances, sarcasm, or figurative language.
613
+ * Factual Accuracy
614
+ * LLMs generate responses based on information they learned from their
615
+ training datasets, but they are not knowledge bases. They may generate
616
+ incorrect or outdated factual statements.
617
+ * Common Sense
618
+ * LLMs rely on statistical patterns in language. They might lack the ability
619
+ to apply common sense reasoning in certain situations.
620
+
621
+ ### Ethical Considerations and Risks
622
+
623
+ The development of large language models (LLMs) raises several ethical concerns.
624
+ In creating an open model, we have carefully considered the following:
625
+
626
+ * Bias and Fairness
627
+ * LLMs trained on large-scale, real-world text data can reflect socio-cultural
628
+ biases embedded in the training material.
629
+ * Misinformation and Misuse
630
+ * LLMs can be misused to generate text that is false, misleading, or harmful.
631
+ * Guidelines are provided for responsible use with the model, see the
632
+ [Responsible Generative AI Toolkit][rai-toolkit].
633
+ * Transparency and Accountability:
634
+ * This model card summarizes details on the models' architecture,
635
+ capabilities, limitations, and evaluation processes.
636
+ * A responsibly developed open model offers the opportunity to share
637
+ innovation by making LLM technology accessible to developers and researchers
638
+ across the AI ecosystem.
639
+
640
+ Risks identified and mitigations:
641
+
642
+ * Perpetuation of biases: It's encouraged to perform continuous monitoring
643
+ (using evaluation metrics, human review) and the exploration of de-biasing
644
+ techniques during model training, fine-tuning, and other use cases.
645
+ * Generation of harmful content: Mechanisms and guidelines for content safety
646
+ are essential. Developers are encouraged to exercise caution and implement
647
+ appropriate content safety safeguards based on their specific product policies
648
+ and application use cases.
649
+ * Privacy violations: Models were trained on data filtered for removal of PII
650
+ (Personally Identifiable Information). Developers are encouraged to adhere to
651
+ privacy regulations with privacy-preserving techniques.
silma-9b-instruct-v1.0.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5750ec31039a87a3c8f5013b79dfd6f67362d6a1f164df3a7f441e460911fd8
3
+ size 5443142592