ssmits commited on
Commit
7a4233f
1 Parent(s): 2c4a5c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -29
README.md CHANGED
@@ -1,55 +1,59 @@
1
  ---
 
 
 
2
  tags:
3
- - merge
4
  - mergekit
 
5
  - lazymergekit
6
  - tiiuae/falcon-11B
7
- base_model:
8
- - tiiuae/falcon-11B
9
- - tiiuae/falcon-11B
10
  ---
 
11
 
12
- # Falcon2-5.5B-multilingual
13
 
14
- Falcon2-5.5B-multilingual is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
15
- * [tiiuae/falcon-11B](https://huggingface.co/tiiuae/falcon-11B)
 
 
 
 
 
 
16
  * [tiiuae/falcon-11B](https://huggingface.co/tiiuae/falcon-11B)
17
 
18
- ## 🧩 Configuration
 
 
 
19
 
20
  ```yaml
21
  slices:
22
  - sources:
23
  - model: tiiuae/falcon-11B
24
- layer_range: [0, 24]
25
  - sources:
26
  - model: tiiuae/falcon-11B
27
- layer_range: [55, 59]
28
  merge_method: passthrough
29
  dtype: bfloat16
30
  ```
31
 
32
- ## 💻 Usage
33
 
34
- ```python
35
- !pip install -qU transformers accelerate
36
 
37
- from transformers import AutoTokenizer
38
- import transformers
39
- import torch
40
 
41
- model = "ssmits/Falcon2-5.5B-multilingual"
42
- messages = [{"role": "user", "content": "What is a large language model?"}]
43
 
44
- tokenizer = AutoTokenizer.from_pretrained(model)
45
- prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
46
- pipeline = transformers.pipeline(
47
- "text-generation",
48
- model=model,
49
- torch_dtype=torch.float16,
50
- device_map="auto",
51
- )
52
 
53
- outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
54
- print(outputs[0]["generated_text"])
55
- ```
 
1
  ---
2
+ base_model:
3
+ - tiiuae/falcon-11B
4
+ library_name: transformers
5
  tags:
 
6
  - mergekit
7
+ - merge
8
  - lazymergekit
9
  - tiiuae/falcon-11B
10
+ license: apache-2.0
11
+ language:
12
+ - cs
13
  ---
14
+ # sliced
15
 
16
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
17
 
18
+ ## Merge Details
19
+ ### Merge Method
20
+
21
+ This model was pruned using the passthrough merge method.
22
+
23
+ ### Models Merged
24
+
25
+ The following models were included in the merge:
26
  * [tiiuae/falcon-11B](https://huggingface.co/tiiuae/falcon-11B)
27
 
28
+ ### Configuration
29
+
30
+ The following YAML configuration was used to produce this model:
31
+
32
 
33
  ```yaml
34
  slices:
35
  - sources:
36
  - model: tiiuae/falcon-11B
37
+ layer_range: [0, 25]
38
  - sources:
39
  - model: tiiuae/falcon-11B
40
+ layer_range: [56, 59]
41
  merge_method: passthrough
42
  dtype: bfloat16
43
  ```
44
 
45
+ [PruneMe](https://github.com/arcee-ai/PruneMe) has been utilized using the wikimedia/wikipedia subsets of 10 languages by investigating layer similarity with 2000 samples per language. The layer ranges for pruning were determined based on the averages of each language analysis to maintain performance while reducing model size.
46
 
47
+ ![Layer Similarity Plot](https://cdn-uploads.huggingface.co/production/uploads/660c0a02cf274b3ab77dd6b7/A-5Wz8FyUKa0mvqbWdDeu.png)
 
48
 
49
+ ## Direct Use
50
+ Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.)
 
51
 
52
+ ## Out-of-Scope Use
53
+ Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
54
 
55
+ ## Bias, Risks, and Limitations
56
+ Falcon2-5.5B is trained mostly on English, but also German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
 
 
 
 
 
 
57
 
58
+ ## Recommendations
59
+ We recommend users of Falcon2-5.5B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.