Update README.md
Browse files
README.md
CHANGED
@@ -2,4 +2,56 @@
|
|
2 |
license: other
|
3 |
license_name: tongyi-qianwen
|
4 |
license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: other
|
3 |
license_name: tongyi-qianwen
|
4 |
license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
|
5 |
+
tags:
|
6 |
+
- merge
|
7 |
+
- mergekit
|
8 |
+
- qwen2
|
9 |
+
- chat
|
10 |
+
- conversational
|
11 |
+
language:
|
12 |
+
- en
|
13 |
+
- chi
|
14 |
+
library_name: transformers
|
15 |
---
|
16 |
+
# Qwen1.5-120B-Chat-Merge
|
17 |
+
**--This is a 120B frankenmerge of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) created by interleaving layers of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) with itself using [mergekit](https://github.com/arcee-ai/mergekit).--**
|
18 |
+
|
19 |
+
*Inspired by other frankenmerge models like [**goliath-120b**](https://huggingface.co/alpindale/goliath-120b) and [**miqu-1-120b**](https://huggingface.co/wolfram/miqu-1-120b)*
|
20 |
+
|
21 |
+
I have adopted a new recipe for merging this 120B model (I tried to expand the recipe to 124B, but experienced a performance decline). Compared to the original 124B version, it has 4B fewer parameters but seems to have improved performance (at least that is my subjective impression). It exhibits fewer hallucinations, better comprehension, and clearer logic than the old version of the 124B model (although I am not sure by how much, as my judgement is based on limited subjectively use). It still cannot (in most time) solve some of my high-difficulty reasoning questions I use for testing, but it seems less likely to get confused and makes more slightly mistakes in the same questions.
|
22 |
+
|
23 |
+
**-Quantize**
|
24 |
+
|
25 |
+
Coming soon...
|
26 |
+
|
27 |
+
**-Merge Configuration**
|
28 |
+
|
29 |
+
This yaml below:
|
30 |
+
```yaml
|
31 |
+
dtype: float16
|
32 |
+
merge_method: passthrough
|
33 |
+
slices:
|
34 |
+
- sources:
|
35 |
+
- layer_range: [0, 20]
|
36 |
+
model: Qwen\Qwen1.5-72B-Chat
|
37 |
+
- sources:
|
38 |
+
- layer_range: [5, 30]
|
39 |
+
model: Qwen\Qwen1.5-72B-Chat
|
40 |
+
- sources:
|
41 |
+
- layer_range: [10, 35]
|
42 |
+
model: Qwen\Qwen1.5-72B-Chat
|
43 |
+
- sources:
|
44 |
+
- layer_range: [30, 50]
|
45 |
+
model: Qwen\Qwen1.5-72B-Chat
|
46 |
+
- sources:
|
47 |
+
- layer_range: [40, 60]
|
48 |
+
model: Qwen\Qwen1.5-72B-Chat
|
49 |
+
- sources:
|
50 |
+
- layer_range: [55, 80]
|
51 |
+
model: Qwen\Qwen1.5-72B-Chat
|
52 |
+
```
|
53 |
+
**-Performance**
|
54 |
+
|
55 |
+
* Tips:I don't have the capability to conduct benchmark tests, nor can I even use it extensively enough, so my test results might not be accurate.I cannot promise that the performance will absolutely be good or bad
|
56 |
+
|
57 |
+
I feel its understanding and logical reasoning abilities are better than the 124B version(subjectively), but I'm not clear about other aspects of its performance (for example, writing ability, as most normal 120B+ models have decent writing, making it difficult to discern superiority).If you believe in this model's performance, feel free to test it out or offer evaluations. Everyone's tests or evaluations are welcome.
|