AlekseiPravdin
commited on
Upload folder using huggingface_hub
Browse files- README.md +57 -0
- ties_config.yaml +16 -0
README.md
ADDED
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
tags:
|
4 |
+
- merge
|
5 |
+
- mergekit
|
6 |
+
- lazymergekit
|
7 |
+
- mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
|
8 |
+
- VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
|
9 |
+
---
|
10 |
+
|
11 |
+
# Llama-3.1-8b-instruct_4bitgs64_hqq_calib-Llama-3.1-SauerkrautLM-8b-Instruct-ties-merge
|
12 |
+
|
13 |
+
Llama-3.1-8b-instruct_4bitgs64_hqq_calib-Llama-3.1-SauerkrautLM-8b-Instruct-ties-merge is a sophisticated model resulting from the strategic merging of two powerful models: [mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib](https://huggingface.co/mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib) and [VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct). This merge was accomplished using [mergekit](https://github.com/cg123/mergekit), a specialized tool that facilitates precise model blending to optimize performance and synergy between the merged architectures.
|
14 |
+
|
15 |
+
## 🧩 Merge Configuration
|
16 |
+
|
17 |
+
```yaml
|
18 |
+
slices:
|
19 |
+
- sources:
|
20 |
+
- model: mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
|
21 |
+
layer_range: [0, 31]
|
22 |
+
- model: VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
|
23 |
+
layer_range: [0, 31]
|
24 |
+
merge_method: ties
|
25 |
+
base_model: mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
|
26 |
+
parameters:
|
27 |
+
t:
|
28 |
+
- filter: self_attn
|
29 |
+
value: [0, 0.5, 0.3, 0.7, 1]
|
30 |
+
- filter: mlp
|
31 |
+
value: [1, 0.5, 0.7, 0.3, 0]
|
32 |
+
- value: 0.5
|
33 |
+
dtype: float16
|
34 |
+
```
|
35 |
+
|
36 |
+
## Model Features
|
37 |
+
|
38 |
+
This merged model combines the advanced capabilities of the HQQ quantized version of Llama-3.1-8B-Instruct with the fine-tuned prowess of the SauerkrautLM variant. The result is a versatile model that excels in both generative tasks and nuanced understanding of multilingual contexts, particularly in German and English. The integration of Spectrum Fine-Tuning from the SauerkrautLM model enhances the efficiency of the model while preserving its extensive knowledge base.
|
39 |
+
|
40 |
+
## Evaluation Results
|
41 |
+
|
42 |
+
The performance of the parent models provides a solid foundation for the merged model. Here are some evaluation results from the original models:
|
43 |
+
|
44 |
+
### mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
|
45 |
+
- **ARC (25-shot)**: 60.49
|
46 |
+
- **HellaSwag (10-shot)**: 80.16
|
47 |
+
- **MMLU (5-shot)**: 68.98
|
48 |
+
- **Average Performance**: 69.51
|
49 |
+
|
50 |
+
### VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
|
51 |
+
- Fine-tuned on German-English data, showcasing significant improvements in multilingual capabilities.
|
52 |
+
|
53 |
+
These results indicate that the merged model is likely to inherit and enhance the strengths of both parent models, particularly in text generation and comprehension tasks.
|
54 |
+
|
55 |
+
## Limitations
|
56 |
+
|
57 |
+
While the merged model benefits from the strengths of both parent models, it may also carry over some limitations. For instance, the potential for uncensored content remains a concern, as noted in the SauerkrautLM documentation. Additionally, the model's performance may vary depending on the specific task and the languages involved, particularly in less common languages or dialects. Users should be aware of these factors when deploying the model in real-world applications.
|
ties_config.yaml
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
slices:
|
2 |
+
- sources:
|
3 |
+
- model: mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
|
4 |
+
layer_range: [0, 31]
|
5 |
+
- model: VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
|
6 |
+
layer_range: [0, 31]
|
7 |
+
merge_method: ties
|
8 |
+
base_model: mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
|
9 |
+
parameters:
|
10 |
+
t:
|
11 |
+
- filter: self_attn
|
12 |
+
value: [0, 0.5, 0.3, 0.7, 1]
|
13 |
+
- filter: mlp
|
14 |
+
value: [1, 0.5, 0.7, 0.3, 0]
|
15 |
+
- value: 0.5
|
16 |
+
dtype: float16
|