ssmits
/

Falcon2-5.5B-Swedish

@@ -1,55 +1,54 @@
 ---
-tags:
-- merge
-- mergekit
-- lazymergekit
-- tiiuae/falcon-11B
 base_model:
 - tiiuae/falcon-11B
-- tiiuae/falcon-11B
 ---
-# tiiuae/falcon-11B
-tiiuae/falcon-11B is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
-* [tiiuae/falcon-11B](https://huggingface.co/tiiuae/falcon-11B)
 * [tiiuae/falcon-11B](https://huggingface.co/tiiuae/falcon-11B)
-## 🧩 Configuration
 ```yaml
 slices:
   - sources:
       - model: tiiuae/falcon-11B
-        layer_range: [0, 25]
   - sources:
       - model: tiiuae/falcon-11B
-        layer_range: [56, 59]
 merge_method: passthrough
 dtype: bfloat16
 ```
-## 💻 Usage
-```python
-!pip install -qU transformers accelerate
-from transformers import AutoTokenizer
-import transformers
-import torch
-model = "ssmits/tiiuae/falcon-11B"
-messages = [{"role": "user", "content": "What is a large language model?"}]
-tokenizer = AutoTokenizer.from_pretrained(model)
-prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-pipeline = transformers.pipeline(
-    "text-generation",
-    model=model,
-    torch_dtype=torch.float16,
-    device_map="auto",
-)
-outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
-print(outputs[0]["generated_text"])
-```

 ---
 base_model:
 - tiiuae/falcon-11B
+library_name: transformers
+tags:
+- mergekit
+- merge
 ---
+# sliced
+This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
+## Merge Details
+### Merge Method
+This model was merged using the passthrough merge method.
+### Models Merged
+The following models were included in the merge:
 * [tiiuae/falcon-11B](https://huggingface.co/tiiuae/falcon-11B)
+### Configuration
+The following YAML configuration was used to produce this model:
 ```yaml
 slices:
   - sources:
       - model: tiiuae/falcon-11B
+        layer_range: [0, 26]
   - sources:
       - model: tiiuae/falcon-11B
+        layer_range: [57, 59]
 merge_method: passthrough
 dtype: bfloat16
 ```
+[PruneMe](https://github.com/arcee-ai/PruneMe) has been utilized using the wikimedia/wikipedia Swedish (sv) subset by investigating layer similarity with 2000 samples. The layer ranges for pruning were determined based on this analysis to maintain performance while reducing model size.
+![Layer Similarity Plot](https://cdn-uploads.huggingface.co/production/uploads/660c0a02cf274b3ab77dd6b7/aS5jGo6KLv6BsmW_aO4PB.png)
+## Direct Use
+Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.)
+## Out-of-Scope Use
+Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
+## Bias, Risks, and Limitations
+Falcon2-5.5B is trained mostly on English, but also German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
+## Recommendations
+We recommend users of Falcon2-5.5B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.