wenhuach commited on
Commit
b92fb61
·
1 Parent(s): 51bbb30

add config file

Browse files

Signed-off-by: wenhuach <[email protected]>

Files changed (2) hide show
  1. .gitattributes +1 -0
  2. config.json +3064 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
config.json ADDED
@@ -0,0 +1,3064 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/data1/DeepSeek-R1-bf16",
3
+ "architectures": [
4
+ "DeepseekV3ForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "auto_map": {
9
+ "AutoConfig": "configuration_deepseek.DeepseekV3Config",
10
+ "AutoModel": "modeling_deepseek.DeepseekV3Model",
11
+ "AutoModelForCausalLM": "modeling_deepseek.DeepseekV3ForCausalLM"
12
+ },
13
+ "aux_loss_alpha": 0.001,
14
+ "bos_token_id": 0,
15
+ "eos_token_id": 1,
16
+ "ep_size": 1,
17
+ "first_k_dense_replace": 3,
18
+ "hidden_act": "silu",
19
+ "hidden_size": 7168,
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 18432,
22
+ "kv_lora_rank": 512,
23
+ "max_position_embeddings": 163840,
24
+ "model_type": "deepseek_v3",
25
+ "moe_intermediate_size": 2048,
26
+ "moe_layer_freq": 1,
27
+ "n_group": 8,
28
+ "n_routed_experts": 256,
29
+ "n_shared_experts": 1,
30
+ "norm_topk_prob": true,
31
+ "num_attention_heads": 128,
32
+ "num_experts_per_tok": 8,
33
+ "num_hidden_layers": 61,
34
+ "num_key_value_heads": 128,
35
+ "num_nextn_predict_layers": 1,
36
+ "pretraining_tp": 1,
37
+ "q_lora_rank": 1536,
38
+ "qk_nope_head_dim": 128,
39
+ "qk_rope_head_dim": 64,
40
+ "quantization_config": {
41
+ "amp": true,
42
+ "autoround_version": "0.4.5",
43
+ "backend": "auto_round:gptq:exllamav2",
44
+ "batch_size": 4,
45
+ "bits": 2,
46
+ "data_type": "int",
47
+ "dataset": "NeelNanda/pile-10k",
48
+ "enable_minmax_tuning": true,
49
+ "enable_norm_bias_tuning": false,
50
+ "enable_quanted_input": true,
51
+ "extra_config": {
52
+ "model.layers.0.mlp.down_proj": {
53
+ "bits": 4,
54
+ "group_size": 128
55
+ },
56
+ "model.layers.0.mlp.gate_proj": {
57
+ "bits": 4,
58
+ "group_size": 128
59
+ },
60
+ "model.layers.0.mlp.up_proj": {
61
+ "bits": 4,
62
+ "group_size": 128
63
+ },
64
+ "model.layers.0.self_attn.kv_a_proj_with_mqa": {
65
+ "bits": 4,
66
+ "group_size": 128
67
+ },
68
+ "model.layers.0.self_attn.kv_b_proj": {
69
+ "bits": 4,
70
+ "group_size": 128
71
+ },
72
+ "model.layers.0.self_attn.o_proj": {
73
+ "bits": 4,
74
+ "group_size": 128
75
+ },
76
+ "model.layers.0.self_attn.q_a_proj": {
77
+ "bits": 4,
78
+ "group_size": 128
79
+ },
80
+ "model.layers.0.self_attn.q_b_proj": {
81
+ "bits": 4,
82
+ "group_size": 128
83
+ },
84
+ "model.layers.1.mlp.down_proj": {
85
+ "bits": 4,
86
+ "group_size": 128
87
+ },
88
+ "model.layers.1.mlp.gate_proj": {
89
+ "bits": 4,
90
+ "group_size": 128
91
+ },
92
+ "model.layers.1.mlp.up_proj": {
93
+ "bits": 4,
94
+ "group_size": 128
95
+ },
96
+ "model.layers.1.self_attn.kv_a_proj_with_mqa": {
97
+ "bits": 4,
98
+ "group_size": 128
99
+ },
100
+ "model.layers.1.self_attn.kv_b_proj": {
101
+ "bits": 4,
102
+ "group_size": 128
103
+ },
104
+ "model.layers.1.self_attn.o_proj": {
105
+ "bits": 4,
106
+ "group_size": 128
107
+ },
108
+ "model.layers.1.self_attn.q_a_proj": {
109
+ "bits": 4,
110
+ "group_size": 128
111
+ },
112
+ "model.layers.1.self_attn.q_b_proj": {
113
+ "bits": 4,
114
+ "group_size": 128
115
+ },
116
+ "model.layers.10.mlp.shared_experts.down_proj": {
117
+ "bits": 4,
118
+ "group_size": 128
119
+ },
120
+ "model.layers.10.mlp.shared_experts.gate_proj": {
121
+ "bits": 4,
122
+ "group_size": 128
123
+ },
124
+ "model.layers.10.mlp.shared_experts.up_proj": {
125
+ "bits": 4,
126
+ "group_size": 128
127
+ },
128
+ "model.layers.10.self_attn.kv_a_proj_with_mqa": {
129
+ "bits": 4,
130
+ "group_size": 128
131
+ },
132
+ "model.layers.10.self_attn.kv_b_proj": {
133
+ "bits": 4,
134
+ "group_size": 128
135
+ },
136
+ "model.layers.10.self_attn.o_proj": {
137
+ "bits": 4,
138
+ "group_size": 128
139
+ },
140
+ "model.layers.10.self_attn.q_a_proj": {
141
+ "bits": 4,
142
+ "group_size": 128
143
+ },
144
+ "model.layers.10.self_attn.q_b_proj": {
145
+ "bits": 4,
146
+ "group_size": 128
147
+ },
148
+ "model.layers.11.mlp.shared_experts.down_proj": {
149
+ "bits": 4,
150
+ "group_size": 128
151
+ },
152
+ "model.layers.11.mlp.shared_experts.gate_proj": {
153
+ "bits": 4,
154
+ "group_size": 128
155
+ },
156
+ "model.layers.11.mlp.shared_experts.up_proj": {
157
+ "bits": 4,
158
+ "group_size": 128
159
+ },
160
+ "model.layers.11.self_attn.kv_a_proj_with_mqa": {
161
+ "bits": 4,
162
+ "group_size": 128
163
+ },
164
+ "model.layers.11.self_attn.kv_b_proj": {
165
+ "bits": 4,
166
+ "group_size": 128
167
+ },
168
+ "model.layers.11.self_attn.o_proj": {
169
+ "bits": 4,
170
+ "group_size": 128
171
+ },
172
+ "model.layers.11.self_attn.q_a_proj": {
173
+ "bits": 4,
174
+ "group_size": 128
175
+ },
176
+ "model.layers.11.self_attn.q_b_proj": {
177
+ "bits": 4,
178
+ "group_size": 128
179
+ },
180
+ "model.layers.12.mlp.shared_experts.down_proj": {
181
+ "bits": 4,
182
+ "group_size": 128
183
+ },
184
+ "model.layers.12.mlp.shared_experts.gate_proj": {
185
+ "bits": 4,
186
+ "group_size": 128
187
+ },
188
+ "model.layers.12.mlp.shared_experts.up_proj": {
189
+ "bits": 4,
190
+ "group_size": 128
191
+ },
192
+ "model.layers.12.self_attn.kv_a_proj_with_mqa": {
193
+ "bits": 4,
194
+ "group_size": 128
195
+ },
196
+ "model.layers.12.self_attn.kv_b_proj": {
197
+ "bits": 4,
198
+ "group_size": 128
199
+ },
200
+ "model.layers.12.self_attn.o_proj": {
201
+ "bits": 4,
202
+ "group_size": 128
203
+ },
204
+ "model.layers.12.self_attn.q_a_proj": {
205
+ "bits": 4,
206
+ "group_size": 128
207
+ },
208
+ "model.layers.12.self_attn.q_b_proj": {
209
+ "bits": 4,
210
+ "group_size": 128
211
+ },
212
+ "model.layers.13.mlp.shared_experts.down_proj": {
213
+ "bits": 4,
214
+ "group_size": 128
215
+ },
216
+ "model.layers.13.mlp.shared_experts.gate_proj": {
217
+ "bits": 4,
218
+ "group_size": 128
219
+ },
220
+ "model.layers.13.mlp.shared_experts.up_proj": {
221
+ "bits": 4,
222
+ "group_size": 128
223
+ },
224
+ "model.layers.13.self_attn.kv_a_proj_with_mqa": {
225
+ "bits": 4,
226
+ "group_size": 128
227
+ },
228
+ "model.layers.13.self_attn.kv_b_proj": {
229
+ "bits": 4,
230
+ "group_size": 128
231
+ },
232
+ "model.layers.13.self_attn.o_proj": {
233
+ "bits": 4,
234
+ "group_size": 128
235
+ },
236
+ "model.layers.13.self_attn.q_a_proj": {
237
+ "bits": 4,
238
+ "group_size": 128
239
+ },
240
+ "model.layers.13.self_attn.q_b_proj": {
241
+ "bits": 4,
242
+ "group_size": 128
243
+ },
244
+ "model.layers.14.mlp.shared_experts.down_proj": {
245
+ "bits": 4,
246
+ "group_size": 128
247
+ },
248
+ "model.layers.14.mlp.shared_experts.gate_proj": {
249
+ "bits": 4,
250
+ "group_size": 128
251
+ },
252
+ "model.layers.14.mlp.shared_experts.up_proj": {
253
+ "bits": 4,
254
+ "group_size": 128
255
+ },
256
+ "model.layers.14.self_attn.kv_a_proj_with_mqa": {
257
+ "bits": 4,
258
+ "group_size": 128
259
+ },
260
+ "model.layers.14.self_attn.kv_b_proj": {
261
+ "bits": 4,
262
+ "group_size": 128
263
+ },
264
+ "model.layers.14.self_attn.o_proj": {
265
+ "bits": 4,
266
+ "group_size": 128
267
+ },
268
+ "model.layers.14.self_attn.q_a_proj": {
269
+ "bits": 4,
270
+ "group_size": 128
271
+ },
272
+ "model.layers.14.self_attn.q_b_proj": {
273
+ "bits": 4,
274
+ "group_size": 128
275
+ },
276
+ "model.layers.15.mlp.shared_experts.down_proj": {
277
+ "bits": 4,
278
+ "group_size": 128
279
+ },
280
+ "model.layers.15.mlp.shared_experts.gate_proj": {
281
+ "bits": 4,
282
+ "group_size": 128
283
+ },
284
+ "model.layers.15.mlp.shared_experts.up_proj": {
285
+ "bits": 4,
286
+ "group_size": 128
287
+ },
288
+ "model.layers.15.self_attn.kv_a_proj_with_mqa": {
289
+ "bits": 4,
290
+ "group_size": 128
291
+ },
292
+ "model.layers.15.self_attn.kv_b_proj": {
293
+ "bits": 4,
294
+ "group_size": 128
295
+ },
296
+ "model.layers.15.self_attn.o_proj": {
297
+ "bits": 4,
298
+ "group_size": 128
299
+ },
300
+ "model.layers.15.self_attn.q_a_proj": {
301
+ "bits": 4,
302
+ "group_size": 128
303
+ },
304
+ "model.layers.15.self_attn.q_b_proj": {
305
+ "bits": 4,
306
+ "group_size": 128
307
+ },
308
+ "model.layers.16.mlp.shared_experts.down_proj": {
309
+ "bits": 4,
310
+ "group_size": 128
311
+ },
312
+ "model.layers.16.mlp.shared_experts.gate_proj": {
313
+ "bits": 4,
314
+ "group_size": 128
315
+ },
316
+ "model.layers.16.mlp.shared_experts.up_proj": {
317
+ "bits": 4,
318
+ "group_size": 128
319
+ },
320
+ "model.layers.16.self_attn.kv_a_proj_with_mqa": {
321
+ "bits": 4,
322
+ "group_size": 128
323
+ },
324
+ "model.layers.16.self_attn.kv_b_proj": {
325
+ "bits": 4,
326
+ "group_size": 128
327
+ },
328
+ "model.layers.16.self_attn.o_proj": {
329
+ "bits": 4,
330
+ "group_size": 128
331
+ },
332
+ "model.layers.16.self_attn.q_a_proj": {
333
+ "bits": 4,
334
+ "group_size": 128
335
+ },
336
+ "model.layers.16.self_attn.q_b_proj": {
337
+ "bits": 4,
338
+ "group_size": 128
339
+ },
340
+ "model.layers.17.mlp.shared_experts.down_proj": {
341
+ "bits": 4,
342
+ "group_size": 128
343
+ },
344
+ "model.layers.17.mlp.shared_experts.gate_proj": {
345
+ "bits": 4,
346
+ "group_size": 128
347
+ },
348
+ "model.layers.17.mlp.shared_experts.up_proj": {
349
+ "bits": 4,
350
+ "group_size": 128
351
+ },
352
+ "model.layers.17.self_attn.kv_a_proj_with_mqa": {
353
+ "bits": 4,
354
+ "group_size": 128
355
+ },
356
+ "model.layers.17.self_attn.kv_b_proj": {
357
+ "bits": 4,
358
+ "group_size": 128
359
+ },
360
+ "model.layers.17.self_attn.o_proj": {
361
+ "bits": 4,
362
+ "group_size": 128
363
+ },
364
+ "model.layers.17.self_attn.q_a_proj": {
365
+ "bits": 4,
366
+ "group_size": 128
367
+ },
368
+ "model.layers.17.self_attn.q_b_proj": {
369
+ "bits": 4,
370
+ "group_size": 128
371
+ },
372
+ "model.layers.18.mlp.shared_experts.down_proj": {
373
+ "bits": 4,
374
+ "group_size": 128
375
+ },
376
+ "model.layers.18.mlp.shared_experts.gate_proj": {
377
+ "bits": 4,
378
+ "group_size": 128
379
+ },
380
+ "model.layers.18.mlp.shared_experts.up_proj": {
381
+ "bits": 4,
382
+ "group_size": 128
383
+ },
384
+ "model.layers.18.self_attn.kv_a_proj_with_mqa": {
385
+ "bits": 4,
386
+ "group_size": 128
387
+ },
388
+ "model.layers.18.self_attn.kv_b_proj": {
389
+ "bits": 4,
390
+ "group_size": 128
391
+ },
392
+ "model.layers.18.self_attn.o_proj": {
393
+ "bits": 4,
394
+ "group_size": 128
395
+ },
396
+ "model.layers.18.self_attn.q_a_proj": {
397
+ "bits": 4,
398
+ "group_size": 128
399
+ },
400
+ "model.layers.18.self_attn.q_b_proj": {
401
+ "bits": 4,
402
+ "group_size": 128
403
+ },
404
+ "model.layers.19.mlp.shared_experts.down_proj": {
405
+ "bits": 4,
406
+ "group_size": 128
407
+ },
408
+ "model.layers.19.mlp.shared_experts.gate_proj": {
409
+ "bits": 4,
410
+ "group_size": 128
411
+ },
412
+ "model.layers.19.mlp.shared_experts.up_proj": {
413
+ "bits": 4,
414
+ "group_size": 128
415
+ },
416
+ "model.layers.19.self_attn.kv_a_proj_with_mqa": {
417
+ "bits": 4,
418
+ "group_size": 128
419
+ },
420
+ "model.layers.19.self_attn.kv_b_proj": {
421
+ "bits": 4,
422
+ "group_size": 128
423
+ },
424
+ "model.layers.19.self_attn.o_proj": {
425
+ "bits": 4,
426
+ "group_size": 128
427
+ },
428
+ "model.layers.19.self_attn.q_a_proj": {
429
+ "bits": 4,
430
+ "group_size": 128
431
+ },
432
+ "model.layers.19.self_attn.q_b_proj": {
433
+ "bits": 4,
434
+ "group_size": 128
435
+ },
436
+ "model.layers.2.mlp.down_proj": {
437
+ "bits": 4,
438
+ "group_size": 128
439
+ },
440
+ "model.layers.2.mlp.gate_proj": {
441
+ "bits": 4,
442
+ "group_size": 128
443
+ },
444
+ "model.layers.2.mlp.up_proj": {
445
+ "bits": 4,
446
+ "group_size": 128
447
+ },
448
+ "model.layers.2.self_attn.kv_a_proj_with_mqa": {
449
+ "bits": 4,
450
+ "group_size": 128
451
+ },
452
+ "model.layers.2.self_attn.kv_b_proj": {
453
+ "bits": 4,
454
+ "group_size": 128
455
+ },
456
+ "model.layers.2.self_attn.o_proj": {
457
+ "bits": 4,
458
+ "group_size": 128
459
+ },
460
+ "model.layers.2.self_attn.q_a_proj": {
461
+ "bits": 4,
462
+ "group_size": 128
463
+ },
464
+ "model.layers.2.self_attn.q_b_proj": {
465
+ "bits": 4,
466
+ "group_size": 128
467
+ },
468
+ "model.layers.20.mlp.shared_experts.down_proj": {
469
+ "bits": 4,
470
+ "group_size": 128
471
+ },
472
+ "model.layers.20.mlp.shared_experts.gate_proj": {
473
+ "bits": 4,
474
+ "group_size": 128
475
+ },
476
+ "model.layers.20.mlp.shared_experts.up_proj": {
477
+ "bits": 4,
478
+ "group_size": 128
479
+ },
480
+ "model.layers.20.self_attn.kv_a_proj_with_mqa": {
481
+ "bits": 4,
482
+ "group_size": 128
483
+ },
484
+ "model.layers.20.self_attn.kv_b_proj": {
485
+ "bits": 4,
486
+ "group_size": 128
487
+ },
488
+ "model.layers.20.self_attn.o_proj": {
489
+ "bits": 4,
490
+ "group_size": 128
491
+ },
492
+ "model.layers.20.self_attn.q_a_proj": {
493
+ "bits": 4,
494
+ "group_size": 128
495
+ },
496
+ "model.layers.20.self_attn.q_b_proj": {
497
+ "bits": 4,
498
+ "group_size": 128
499
+ },
500
+ "model.layers.21.mlp.shared_experts.down_proj": {
501
+ "bits": 4,
502
+ "group_size": 128
503
+ },
504
+ "model.layers.21.mlp.shared_experts.gate_proj": {
505
+ "bits": 4,
506
+ "group_size": 128
507
+ },
508
+ "model.layers.21.mlp.shared_experts.up_proj": {
509
+ "bits": 4,
510
+ "group_size": 128
511
+ },
512
+ "model.layers.21.self_attn.kv_a_proj_with_mqa": {
513
+ "bits": 4,
514
+ "group_size": 128
515
+ },
516
+ "model.layers.21.self_attn.kv_b_proj": {
517
+ "bits": 4,
518
+ "group_size": 128
519
+ },
520
+ "model.layers.21.self_attn.o_proj": {
521
+ "bits": 4,
522
+ "group_size": 128
523
+ },
524
+ "model.layers.21.self_attn.q_a_proj": {
525
+ "bits": 4,
526
+ "group_size": 128
527
+ },
528
+ "model.layers.21.self_attn.q_b_proj": {
529
+ "bits": 4,
530
+ "group_size": 128
531
+ },
532
+ "model.layers.22.mlp.shared_experts.down_proj": {
533
+ "bits": 4,
534
+ "group_size": 128
535
+ },
536
+ "model.layers.22.mlp.shared_experts.gate_proj": {
537
+ "bits": 4,
538
+ "group_size": 128
539
+ },
540
+ "model.layers.22.mlp.shared_experts.up_proj": {
541
+ "bits": 4,
542
+ "group_size": 128
543
+ },
544
+ "model.layers.22.self_attn.kv_a_proj_with_mqa": {
545
+ "bits": 4,
546
+ "group_size": 128
547
+ },
548
+ "model.layers.22.self_attn.kv_b_proj": {
549
+ "bits": 4,
550
+ "group_size": 128
551
+ },
552
+ "model.layers.22.self_attn.o_proj": {
553
+ "bits": 4,
554
+ "group_size": 128
555
+ },
556
+ "model.layers.22.self_attn.q_a_proj": {
557
+ "bits": 4,
558
+ "group_size": 128
559
+ },
560
+ "model.layers.22.self_attn.q_b_proj": {
561
+ "bits": 4,
562
+ "group_size": 128
563
+ },
564
+ "model.layers.23.mlp.shared_experts.down_proj": {
565
+ "bits": 4,
566
+ "group_size": 128
567
+ },
568
+ "model.layers.23.mlp.shared_experts.gate_proj": {
569
+ "bits": 4,
570
+ "group_size": 128
571
+ },
572
+ "model.layers.23.mlp.shared_experts.up_proj": {
573
+ "bits": 4,
574
+ "group_size": 128
575
+ },
576
+ "model.layers.23.self_attn.kv_a_proj_with_mqa": {
577
+ "bits": 4,
578
+ "group_size": 128
579
+ },
580
+ "model.layers.23.self_attn.kv_b_proj": {
581
+ "bits": 4,
582
+ "group_size": 128
583
+ },
584
+ "model.layers.23.self_attn.o_proj": {
585
+ "bits": 4,
586
+ "group_size": 128
587
+ },
588
+ "model.layers.23.self_attn.q_a_proj": {
589
+ "bits": 4,
590
+ "group_size": 128
591
+ },
592
+ "model.layers.23.self_attn.q_b_proj": {
593
+ "bits": 4,
594
+ "group_size": 128
595
+ },
596
+ "model.layers.24.mlp.shared_experts.down_proj": {
597
+ "bits": 4,
598
+ "group_size": 128
599
+ },
600
+ "model.layers.24.mlp.shared_experts.gate_proj": {
601
+ "bits": 4,
602
+ "group_size": 128
603
+ },
604
+ "model.layers.24.mlp.shared_experts.up_proj": {
605
+ "bits": 4,
606
+ "group_size": 128
607
+ },
608
+ "model.layers.24.self_attn.kv_a_proj_with_mqa": {
609
+ "bits": 4,
610
+ "group_size": 128
611
+ },
612
+ "model.layers.24.self_attn.kv_b_proj": {
613
+ "bits": 4,
614
+ "group_size": 128
615
+ },
616
+ "model.layers.24.self_attn.o_proj": {
617
+ "bits": 4,
618
+ "group_size": 128
619
+ },
620
+ "model.layers.24.self_attn.q_a_proj": {
621
+ "bits": 4,
622
+ "group_size": 128
623
+ },
624
+ "model.layers.24.self_attn.q_b_proj": {
625
+ "bits": 4,
626
+ "group_size": 128
627
+ },
628
+ "model.layers.25.mlp.shared_experts.down_proj": {
629
+ "bits": 4,
630
+ "group_size": 128
631
+ },
632
+ "model.layers.25.mlp.shared_experts.gate_proj": {
633
+ "bits": 4,
634
+ "group_size": 128
635
+ },
636
+ "model.layers.25.mlp.shared_experts.up_proj": {
637
+ "bits": 4,
638
+ "group_size": 128
639
+ },
640
+ "model.layers.25.self_attn.kv_a_proj_with_mqa": {
641
+ "bits": 4,
642
+ "group_size": 128
643
+ },
644
+ "model.layers.25.self_attn.kv_b_proj": {
645
+ "bits": 4,
646
+ "group_size": 128
647
+ },
648
+ "model.layers.25.self_attn.o_proj": {
649
+ "bits": 4,
650
+ "group_size": 128
651
+ },
652
+ "model.layers.25.self_attn.q_a_proj": {
653
+ "bits": 4,
654
+ "group_size": 128
655
+ },
656
+ "model.layers.25.self_attn.q_b_proj": {
657
+ "bits": 4,
658
+ "group_size": 128
659
+ },
660
+ "model.layers.26.mlp.shared_experts.down_proj": {
661
+ "bits": 4,
662
+ "group_size": 128
663
+ },
664
+ "model.layers.26.mlp.shared_experts.gate_proj": {
665
+ "bits": 4,
666
+ "group_size": 128
667
+ },
668
+ "model.layers.26.mlp.shared_experts.up_proj": {
669
+ "bits": 4,
670
+ "group_size": 128
671
+ },
672
+ "model.layers.26.self_attn.kv_a_proj_with_mqa": {
673
+ "bits": 4,
674
+ "group_size": 128
675
+ },
676
+ "model.layers.26.self_attn.kv_b_proj": {
677
+ "bits": 4,
678
+ "group_size": 128
679
+ },
680
+ "model.layers.26.self_attn.o_proj": {
681
+ "bits": 4,
682
+ "group_size": 128
683
+ },
684
+ "model.layers.26.self_attn.q_a_proj": {
685
+ "bits": 4,
686
+ "group_size": 128
687
+ },
688
+ "model.layers.26.self_attn.q_b_proj": {
689
+ "bits": 4,
690
+ "group_size": 128
691
+ },
692
+ "model.layers.27.mlp.shared_experts.down_proj": {
693
+ "bits": 4,
694
+ "group_size": 128
695
+ },
696
+ "model.layers.27.mlp.shared_experts.gate_proj": {
697
+ "bits": 4,
698
+ "group_size": 128
699
+ },
700
+ "model.layers.27.mlp.shared_experts.up_proj": {
701
+ "bits": 4,
702
+ "group_size": 128
703
+ },
704
+ "model.layers.27.self_attn.kv_a_proj_with_mqa": {
705
+ "bits": 4,
706
+ "group_size": 128
707
+ },
708
+ "model.layers.27.self_attn.kv_b_proj": {
709
+ "bits": 4,
710
+ "group_size": 128
711
+ },
712
+ "model.layers.27.self_attn.o_proj": {
713
+ "bits": 4,
714
+ "group_size": 128
715
+ },
716
+ "model.layers.27.self_attn.q_a_proj": {
717
+ "bits": 4,
718
+ "group_size": 128
719
+ },
720
+ "model.layers.27.self_attn.q_b_proj": {
721
+ "bits": 4,
722
+ "group_size": 128
723
+ },
724
+ "model.layers.28.mlp.shared_experts.down_proj": {
725
+ "bits": 4,
726
+ "group_size": 128
727
+ },
728
+ "model.layers.28.mlp.shared_experts.gate_proj": {
729
+ "bits": 4,
730
+ "group_size": 128
731
+ },
732
+ "model.layers.28.mlp.shared_experts.up_proj": {
733
+ "bits": 4,
734
+ "group_size": 128
735
+ },
736
+ "model.layers.28.self_attn.kv_a_proj_with_mqa": {
737
+ "bits": 4,
738
+ "group_size": 128
739
+ },
740
+ "model.layers.28.self_attn.kv_b_proj": {
741
+ "bits": 4,
742
+ "group_size": 128
743
+ },
744
+ "model.layers.28.self_attn.o_proj": {
745
+ "bits": 4,
746
+ "group_size": 128
747
+ },
748
+ "model.layers.28.self_attn.q_a_proj": {
749
+ "bits": 4,
750
+ "group_size": 128
751
+ },
752
+ "model.layers.28.self_attn.q_b_proj": {
753
+ "bits": 4,
754
+ "group_size": 128
755
+ },
756
+ "model.layers.29.mlp.shared_experts.down_proj": {
757
+ "bits": 4,
758
+ "group_size": 128
759
+ },
760
+ "model.layers.29.mlp.shared_experts.gate_proj": {
761
+ "bits": 4,
762
+ "group_size": 128
763
+ },
764
+ "model.layers.29.mlp.shared_experts.up_proj": {
765
+ "bits": 4,
766
+ "group_size": 128
767
+ },
768
+ "model.layers.29.self_attn.kv_a_proj_with_mqa": {
769
+ "bits": 4,
770
+ "group_size": 128
771
+ },
772
+ "model.layers.29.self_attn.kv_b_proj": {
773
+ "bits": 4,
774
+ "group_size": 128
775
+ },
776
+ "model.layers.29.self_attn.o_proj": {
777
+ "bits": 4,
778
+ "group_size": 128
779
+ },
780
+ "model.layers.29.self_attn.q_a_proj": {
781
+ "bits": 4,
782
+ "group_size": 128
783
+ },
784
+ "model.layers.29.self_attn.q_b_proj": {
785
+ "bits": 4,
786
+ "group_size": 128
787
+ },
788
+ "model.layers.3.mlp.shared_experts.down_proj": {
789
+ "bits": 4,
790
+ "group_size": 128
791
+ },
792
+ "model.layers.3.mlp.shared_experts.gate_proj": {
793
+ "bits": 4,
794
+ "group_size": 128
795
+ },
796
+ "model.layers.3.mlp.shared_experts.up_proj": {
797
+ "bits": 4,
798
+ "group_size": 128
799
+ },
800
+ "model.layers.3.self_attn.kv_a_proj_with_mqa": {
801
+ "bits": 4,
802
+ "group_size": 128
803
+ },
804
+ "model.layers.3.self_attn.kv_b_proj": {
805
+ "bits": 4,
806
+ "group_size": 128
807
+ },
808
+ "model.layers.3.self_attn.o_proj": {
809
+ "bits": 4,
810
+ "group_size": 128
811
+ },
812
+ "model.layers.3.self_attn.q_a_proj": {
813
+ "bits": 4,
814
+ "group_size": 128
815
+ },
816
+ "model.layers.3.self_attn.q_b_proj": {
817
+ "bits": 4,
818
+ "group_size": 128
819
+ },
820
+ "model.layers.30.mlp.shared_experts.down_proj": {
821
+ "bits": 4,
822
+ "group_size": 128
823
+ },
824
+ "model.layers.30.mlp.shared_experts.gate_proj": {
825
+ "bits": 4,
826
+ "group_size": 128
827
+ },
828
+ "model.layers.30.mlp.shared_experts.up_proj": {
829
+ "bits": 4,
830
+ "group_size": 128
831
+ },
832
+ "model.layers.30.self_attn.kv_a_proj_with_mqa": {
833
+ "bits": 4,
834
+ "group_size": 128
835
+ },
836
+ "model.layers.30.self_attn.kv_b_proj": {
837
+ "bits": 4,
838
+ "group_size": 128
839
+ },
840
+ "model.layers.30.self_attn.o_proj": {
841
+ "bits": 4,
842
+ "group_size": 128
843
+ },
844
+ "model.layers.30.self_attn.q_a_proj": {
845
+ "bits": 4,
846
+ "group_size": 128
847
+ },
848
+ "model.layers.30.self_attn.q_b_proj": {
849
+ "bits": 4,
850
+ "group_size": 128
851
+ },
852
+ "model.layers.31.mlp.shared_experts.down_proj": {
853
+ "bits": 4,
854
+ "group_size": 128
855
+ },
856
+ "model.layers.31.mlp.shared_experts.gate_proj": {
857
+ "bits": 4,
858
+ "group_size": 128
859
+ },
860
+ "model.layers.31.mlp.shared_experts.up_proj": {
861
+ "bits": 4,
862
+ "group_size": 128
863
+ },
864
+ "model.layers.31.self_attn.kv_a_proj_with_mqa": {
865
+ "bits": 4,
866
+ "group_size": 128
867
+ },
868
+ "model.layers.31.self_attn.kv_b_proj": {
869
+ "bits": 4,
870
+ "group_size": 128
871
+ },
872
+ "model.layers.31.self_attn.o_proj": {
873
+ "bits": 4,
874
+ "group_size": 128
875
+ },
876
+ "model.layers.31.self_attn.q_a_proj": {
877
+ "bits": 4,
878
+ "group_size": 128
879
+ },
880
+ "model.layers.31.self_attn.q_b_proj": {
881
+ "bits": 4,
882
+ "group_size": 128
883
+ },
884
+ "model.layers.32.mlp.shared_experts.down_proj": {
885
+ "bits": 4,
886
+ "group_size": 128
887
+ },
888
+ "model.layers.32.mlp.shared_experts.gate_proj": {
889
+ "bits": 4,
890
+ "group_size": 128
891
+ },
892
+ "model.layers.32.mlp.shared_experts.up_proj": {
893
+ "bits": 4,
894
+ "group_size": 128
895
+ },
896
+ "model.layers.32.self_attn.kv_a_proj_with_mqa": {
897
+ "bits": 4,
898
+ "group_size": 128
899
+ },
900
+ "model.layers.32.self_attn.kv_b_proj": {
901
+ "bits": 4,
902
+ "group_size": 128
903
+ },
904
+ "model.layers.32.self_attn.o_proj": {
905
+ "bits": 4,
906
+ "group_size": 128
907
+ },
908
+ "model.layers.32.self_attn.q_a_proj": {
909
+ "bits": 4,
910
+ "group_size": 128
911
+ },
912
+ "model.layers.32.self_attn.q_b_proj": {
913
+ "bits": 4,
914
+ "group_size": 128
915
+ },
916
+ "model.layers.33.mlp.shared_experts.down_proj": {
917
+ "bits": 4,
918
+ "group_size": 128
919
+ },
920
+ "model.layers.33.mlp.shared_experts.gate_proj": {
921
+ "bits": 4,
922
+ "group_size": 128
923
+ },
924
+ "model.layers.33.mlp.shared_experts.up_proj": {
925
+ "bits": 4,
926
+ "group_size": 128
927
+ },
928
+ "model.layers.33.self_attn.kv_a_proj_with_mqa": {
929
+ "bits": 4,
930
+ "group_size": 128
931
+ },
932
+ "model.layers.33.self_attn.kv_b_proj": {
933
+ "bits": 4,
934
+ "group_size": 128
935
+ },
936
+ "model.layers.33.self_attn.o_proj": {
937
+ "bits": 4,
938
+ "group_size": 128
939
+ },
940
+ "model.layers.33.self_attn.q_a_proj": {
941
+ "bits": 4,
942
+ "group_size": 128
943
+ },
944
+ "model.layers.33.self_attn.q_b_proj": {
945
+ "bits": 4,
946
+ "group_size": 128
947
+ },
948
+ "model.layers.34.mlp.shared_experts.down_proj": {
949
+ "bits": 4,
950
+ "group_size": 128
951
+ },
952
+ "model.layers.34.mlp.shared_experts.gate_proj": {
953
+ "bits": 4,
954
+ "group_size": 128
955
+ },
956
+ "model.layers.34.mlp.shared_experts.up_proj": {
957
+ "bits": 4,
958
+ "group_size": 128
959
+ },
960
+ "model.layers.34.self_attn.kv_a_proj_with_mqa": {
961
+ "bits": 4,
962
+ "group_size": 128
963
+ },
964
+ "model.layers.34.self_attn.kv_b_proj": {
965
+ "bits": 4,
966
+ "group_size": 128
967
+ },
968
+ "model.layers.34.self_attn.o_proj": {
969
+ "bits": 4,
970
+ "group_size": 128
971
+ },
972
+ "model.layers.34.self_attn.q_a_proj": {
973
+ "bits": 4,
974
+ "group_size": 128
975
+ },
976
+ "model.layers.34.self_attn.q_b_proj": {
977
+ "bits": 4,
978
+ "group_size": 128
979
+ },
980
+ "model.layers.35.mlp.shared_experts.down_proj": {
981
+ "bits": 4,
982
+ "group_size": 128
983
+ },
984
+ "model.layers.35.mlp.shared_experts.gate_proj": {
985
+ "bits": 4,
986
+ "group_size": 128
987
+ },
988
+ "model.layers.35.mlp.shared_experts.up_proj": {
989
+ "bits": 4,
990
+ "group_size": 128
991
+ },
992
+ "model.layers.35.self_attn.kv_a_proj_with_mqa": {
993
+ "bits": 4,
994
+ "group_size": 128
995
+ },
996
+ "model.layers.35.self_attn.kv_b_proj": {
997
+ "bits": 4,
998
+ "group_size": 128
999
+ },
1000
+ "model.layers.35.self_attn.o_proj": {
1001
+ "bits": 4,
1002
+ "group_size": 128
1003
+ },
1004
+ "model.layers.35.self_attn.q_a_proj": {
1005
+ "bits": 4,
1006
+ "group_size": 128
1007
+ },
1008
+ "model.layers.35.self_attn.q_b_proj": {
1009
+ "bits": 4,
1010
+ "group_size": 128
1011
+ },
1012
+ "model.layers.36.mlp.shared_experts.down_proj": {
1013
+ "bits": 4,
1014
+ "group_size": 128
1015
+ },
1016
+ "model.layers.36.mlp.shared_experts.gate_proj": {
1017
+ "bits": 4,
1018
+ "group_size": 128
1019
+ },
1020
+ "model.layers.36.mlp.shared_experts.up_proj": {
1021
+ "bits": 4,
1022
+ "group_size": 128
1023
+ },
1024
+ "model.layers.36.self_attn.kv_a_proj_with_mqa": {
1025
+ "bits": 4,
1026
+ "group_size": 128
1027
+ },
1028
+ "model.layers.36.self_attn.kv_b_proj": {
1029
+ "bits": 4,
1030
+ "group_size": 128
1031
+ },
1032
+ "model.layers.36.self_attn.o_proj": {
1033
+ "bits": 4,
1034
+ "group_size": 128
1035
+ },
1036
+ "model.layers.36.self_attn.q_a_proj": {
1037
+ "bits": 4,
1038
+ "group_size": 128
1039
+ },
1040
+ "model.layers.36.self_attn.q_b_proj": {
1041
+ "bits": 4,
1042
+ "group_size": 128
1043
+ },
1044
+ "model.layers.37.mlp.shared_experts.down_proj": {
1045
+ "bits": 4,
1046
+ "group_size": 128
1047
+ },
1048
+ "model.layers.37.mlp.shared_experts.gate_proj": {
1049
+ "bits": 4,
1050
+ "group_size": 128
1051
+ },
1052
+ "model.layers.37.mlp.shared_experts.up_proj": {
1053
+ "bits": 4,
1054
+ "group_size": 128
1055
+ },
1056
+ "model.layers.37.self_attn.kv_a_proj_with_mqa": {
1057
+ "bits": 4,
1058
+ "group_size": 128
1059
+ },
1060
+ "model.layers.37.self_attn.kv_b_proj": {
1061
+ "bits": 4,
1062
+ "group_size": 128
1063
+ },
1064
+ "model.layers.37.self_attn.o_proj": {
1065
+ "bits": 4,
1066
+ "group_size": 128
1067
+ },
1068
+ "model.layers.37.self_attn.q_a_proj": {
1069
+ "bits": 4,
1070
+ "group_size": 128
1071
+ },
1072
+ "model.layers.37.self_attn.q_b_proj": {
1073
+ "bits": 4,
1074
+ "group_size": 128
1075
+ },
1076
+ "model.layers.38.mlp.shared_experts.down_proj": {
1077
+ "bits": 4,
1078
+ "group_size": 128
1079
+ },
1080
+ "model.layers.38.mlp.shared_experts.gate_proj": {
1081
+ "bits": 4,
1082
+ "group_size": 128
1083
+ },
1084
+ "model.layers.38.mlp.shared_experts.up_proj": {
1085
+ "bits": 4,
1086
+ "group_size": 128
1087
+ },
1088
+ "model.layers.38.self_attn.kv_a_proj_with_mqa": {
1089
+ "bits": 4,
1090
+ "group_size": 128
1091
+ },
1092
+ "model.layers.38.self_attn.kv_b_proj": {
1093
+ "bits": 4,
1094
+ "group_size": 128
1095
+ },
1096
+ "model.layers.38.self_attn.o_proj": {
1097
+ "bits": 4,
1098
+ "group_size": 128
1099
+ },
1100
+ "model.layers.38.self_attn.q_a_proj": {
1101
+ "bits": 4,
1102
+ "group_size": 128
1103
+ },
1104
+ "model.layers.38.self_attn.q_b_proj": {
1105
+ "bits": 4,
1106
+ "group_size": 128
1107
+ },
1108
+ "model.layers.39.mlp.shared_experts.down_proj": {
1109
+ "bits": 4,
1110
+ "group_size": 128
1111
+ },
1112
+ "model.layers.39.mlp.shared_experts.gate_proj": {
1113
+ "bits": 4,
1114
+ "group_size": 128
1115
+ },
1116
+ "model.layers.39.mlp.shared_experts.up_proj": {
1117
+ "bits": 4,
1118
+ "group_size": 128
1119
+ },
1120
+ "model.layers.39.self_attn.kv_a_proj_with_mqa": {
1121
+ "bits": 4,
1122
+ "group_size": 128
1123
+ },
1124
+ "model.layers.39.self_attn.kv_b_proj": {
1125
+ "bits": 4,
1126
+ "group_size": 128
1127
+ },
1128
+ "model.layers.39.self_attn.o_proj": {
1129
+ "bits": 4,
1130
+ "group_size": 128
1131
+ },
1132
+ "model.layers.39.self_attn.q_a_proj": {
1133
+ "bits": 4,
1134
+ "group_size": 128
1135
+ },
1136
+ "model.layers.39.self_attn.q_b_proj": {
1137
+ "bits": 4,
1138
+ "group_size": 128
1139
+ },
1140
+ "model.layers.4.mlp.shared_experts.down_proj": {
1141
+ "bits": 4,
1142
+ "group_size": 128
1143
+ },
1144
+ "model.layers.4.mlp.shared_experts.gate_proj": {
1145
+ "bits": 4,
1146
+ "group_size": 128
1147
+ },
1148
+ "model.layers.4.mlp.shared_experts.up_proj": {
1149
+ "bits": 4,
1150
+ "group_size": 128
1151
+ },
1152
+ "model.layers.4.self_attn.kv_a_proj_with_mqa": {
1153
+ "bits": 4,
1154
+ "group_size": 128
1155
+ },
1156
+ "model.layers.4.self_attn.kv_b_proj": {
1157
+ "bits": 4,
1158
+ "group_size": 128
1159
+ },
1160
+ "model.layers.4.self_attn.o_proj": {
1161
+ "bits": 4,
1162
+ "group_size": 128
1163
+ },
1164
+ "model.layers.4.self_attn.q_a_proj": {
1165
+ "bits": 4,
1166
+ "group_size": 128
1167
+ },
1168
+ "model.layers.4.self_attn.q_b_proj": {
1169
+ "bits": 4,
1170
+ "group_size": 128
1171
+ },
1172
+ "model.layers.40.mlp.shared_experts.down_proj": {
1173
+ "bits": 4,
1174
+ "group_size": 128
1175
+ },
1176
+ "model.layers.40.mlp.shared_experts.gate_proj": {
1177
+ "bits": 4,
1178
+ "group_size": 128
1179
+ },
1180
+ "model.layers.40.mlp.shared_experts.up_proj": {
1181
+ "bits": 4,
1182
+ "group_size": 128
1183
+ },
1184
+ "model.layers.40.self_attn.kv_a_proj_with_mqa": {
1185
+ "bits": 4,
1186
+ "group_size": 128
1187
+ },
1188
+ "model.layers.40.self_attn.kv_b_proj": {
1189
+ "bits": 4,
1190
+ "group_size": 128
1191
+ },
1192
+ "model.layers.40.self_attn.o_proj": {
1193
+ "bits": 4,
1194
+ "group_size": 128
1195
+ },
1196
+ "model.layers.40.self_attn.q_a_proj": {
1197
+ "bits": 4,
1198
+ "group_size": 128
1199
+ },
1200
+ "model.layers.40.self_attn.q_b_proj": {
1201
+ "bits": 4,
1202
+ "group_size": 128
1203
+ },
1204
+ "model.layers.41.mlp.shared_experts.down_proj": {
1205
+ "bits": 4,
1206
+ "group_size": 128
1207
+ },
1208
+ "model.layers.41.mlp.shared_experts.gate_proj": {
1209
+ "bits": 4,
1210
+ "group_size": 128
1211
+ },
1212
+ "model.layers.41.mlp.shared_experts.up_proj": {
1213
+ "bits": 4,
1214
+ "group_size": 128
1215
+ },
1216
+ "model.layers.41.self_attn.kv_a_proj_with_mqa": {
1217
+ "bits": 4,
1218
+ "group_size": 128
1219
+ },
1220
+ "model.layers.41.self_attn.kv_b_proj": {
1221
+ "bits": 4,
1222
+ "group_size": 128
1223
+ },
1224
+ "model.layers.41.self_attn.o_proj": {
1225
+ "bits": 4,
1226
+ "group_size": 128
1227
+ },
1228
+ "model.layers.41.self_attn.q_a_proj": {
1229
+ "bits": 4,
1230
+ "group_size": 128
1231
+ },
1232
+ "model.layers.41.self_attn.q_b_proj": {
1233
+ "bits": 4,
1234
+ "group_size": 128
1235
+ },
1236
+ "model.layers.42.mlp.shared_experts.down_proj": {
1237
+ "bits": 4,
1238
+ "group_size": 128
1239
+ },
1240
+ "model.layers.42.mlp.shared_experts.gate_proj": {
1241
+ "bits": 4,
1242
+ "group_size": 128
1243
+ },
1244
+ "model.layers.42.mlp.shared_experts.up_proj": {
1245
+ "bits": 4,
1246
+ "group_size": 128
1247
+ },
1248
+ "model.layers.42.self_attn.kv_a_proj_with_mqa": {
1249
+ "bits": 4,
1250
+ "group_size": 128
1251
+ },
1252
+ "model.layers.42.self_attn.kv_b_proj": {
1253
+ "bits": 4,
1254
+ "group_size": 128
1255
+ },
1256
+ "model.layers.42.self_attn.o_proj": {
1257
+ "bits": 4,
1258
+ "group_size": 128
1259
+ },
1260
+ "model.layers.42.self_attn.q_a_proj": {
1261
+ "bits": 4,
1262
+ "group_size": 128
1263
+ },
1264
+ "model.layers.42.self_attn.q_b_proj": {
1265
+ "bits": 4,
1266
+ "group_size": 128
1267
+ },
1268
+ "model.layers.43.mlp.shared_experts.down_proj": {
1269
+ "bits": 4,
1270
+ "group_size": 128
1271
+ },
1272
+ "model.layers.43.mlp.shared_experts.gate_proj": {
1273
+ "bits": 4,
1274
+ "group_size": 128
1275
+ },
1276
+ "model.layers.43.mlp.shared_experts.up_proj": {
1277
+ "bits": 4,
1278
+ "group_size": 128
1279
+ },
1280
+ "model.layers.43.self_attn.kv_a_proj_with_mqa": {
1281
+ "bits": 4,
1282
+ "group_size": 128
1283
+ },
1284
+ "model.layers.43.self_attn.kv_b_proj": {
1285
+ "bits": 4,
1286
+ "group_size": 128
1287
+ },
1288
+ "model.layers.43.self_attn.o_proj": {
1289
+ "bits": 4,
1290
+ "group_size": 128
1291
+ },
1292
+ "model.layers.43.self_attn.q_a_proj": {
1293
+ "bits": 4,
1294
+ "group_size": 128
1295
+ },
1296
+ "model.layers.43.self_attn.q_b_proj": {
1297
+ "bits": 4,
1298
+ "group_size": 128
1299
+ },
1300
+ "model.layers.44.mlp.shared_experts.down_proj": {
1301
+ "bits": 4,
1302
+ "group_size": 128
1303
+ },
1304
+ "model.layers.44.mlp.shared_experts.gate_proj": {
1305
+ "bits": 4,
1306
+ "group_size": 128
1307
+ },
1308
+ "model.layers.44.mlp.shared_experts.up_proj": {
1309
+ "bits": 4,
1310
+ "group_size": 128
1311
+ },
1312
+ "model.layers.44.self_attn.kv_a_proj_with_mqa": {
1313
+ "bits": 4,
1314
+ "group_size": 128
1315
+ },
1316
+ "model.layers.44.self_attn.kv_b_proj": {
1317
+ "bits": 4,
1318
+ "group_size": 128
1319
+ },
1320
+ "model.layers.44.self_attn.o_proj": {
1321
+ "bits": 4,
1322
+ "group_size": 128
1323
+ },
1324
+ "model.layers.44.self_attn.q_a_proj": {
1325
+ "bits": 4,
1326
+ "group_size": 128
1327
+ },
1328
+ "model.layers.44.self_attn.q_b_proj": {
1329
+ "bits": 4,
1330
+ "group_size": 128
1331
+ },
1332
+ "model.layers.45.mlp.shared_experts.down_proj": {
1333
+ "bits": 4,
1334
+ "group_size": 128
1335
+ },
1336
+ "model.layers.45.mlp.shared_experts.gate_proj": {
1337
+ "bits": 4,
1338
+ "group_size": 128
1339
+ },
1340
+ "model.layers.45.mlp.shared_experts.up_proj": {
1341
+ "bits": 4,
1342
+ "group_size": 128
1343
+ },
1344
+ "model.layers.45.self_attn.kv_a_proj_with_mqa": {
1345
+ "bits": 4,
1346
+ "group_size": 128
1347
+ },
1348
+ "model.layers.45.self_attn.kv_b_proj": {
1349
+ "bits": 4,
1350
+ "group_size": 128
1351
+ },
1352
+ "model.layers.45.self_attn.o_proj": {
1353
+ "bits": 4,
1354
+ "group_size": 128
1355
+ },
1356
+ "model.layers.45.self_attn.q_a_proj": {
1357
+ "bits": 4,
1358
+ "group_size": 128
1359
+ },
1360
+ "model.layers.45.self_attn.q_b_proj": {
1361
+ "bits": 4,
1362
+ "group_size": 128
1363
+ },
1364
+ "model.layers.46.mlp.shared_experts.down_proj": {
1365
+ "bits": 4,
1366
+ "group_size": 128
1367
+ },
1368
+ "model.layers.46.mlp.shared_experts.gate_proj": {
1369
+ "bits": 4,
1370
+ "group_size": 128
1371
+ },
1372
+ "model.layers.46.mlp.shared_experts.up_proj": {
1373
+ "bits": 4,
1374
+ "group_size": 128
1375
+ },
1376
+ "model.layers.46.self_attn.kv_a_proj_with_mqa": {
1377
+ "bits": 4,
1378
+ "group_size": 128
1379
+ },
1380
+ "model.layers.46.self_attn.kv_b_proj": {
1381
+ "bits": 4,
1382
+ "group_size": 128
1383
+ },
1384
+ "model.layers.46.self_attn.o_proj": {
1385
+ "bits": 4,
1386
+ "group_size": 128
1387
+ },
1388
+ "model.layers.46.self_attn.q_a_proj": {
1389
+ "bits": 4,
1390
+ "group_size": 128
1391
+ },
1392
+ "model.layers.46.self_attn.q_b_proj": {
1393
+ "bits": 4,
1394
+ "group_size": 128
1395
+ },
1396
+ "model.layers.47.mlp.shared_experts.down_proj": {
1397
+ "bits": 4,
1398
+ "group_size": 128
1399
+ },
1400
+ "model.layers.47.mlp.shared_experts.gate_proj": {
1401
+ "bits": 4,
1402
+ "group_size": 128
1403
+ },
1404
+ "model.layers.47.mlp.shared_experts.up_proj": {
1405
+ "bits": 4,
1406
+ "group_size": 128
1407
+ },
1408
+ "model.layers.47.self_attn.kv_a_proj_with_mqa": {
1409
+ "bits": 4,
1410
+ "group_size": 128
1411
+ },
1412
+ "model.layers.47.self_attn.kv_b_proj": {
1413
+ "bits": 4,
1414
+ "group_size": 128
1415
+ },
1416
+ "model.layers.47.self_attn.o_proj": {
1417
+ "bits": 4,
1418
+ "group_size": 128
1419
+ },
1420
+ "model.layers.47.self_attn.q_a_proj": {
1421
+ "bits": 4,
1422
+ "group_size": 128
1423
+ },
1424
+ "model.layers.47.self_attn.q_b_proj": {
1425
+ "bits": 4,
1426
+ "group_size": 128
1427
+ },
1428
+ "model.layers.48.mlp.shared_experts.down_proj": {
1429
+ "bits": 4,
1430
+ "group_size": 128
1431
+ },
1432
+ "model.layers.48.mlp.shared_experts.gate_proj": {
1433
+ "bits": 4,
1434
+ "group_size": 128
1435
+ },
1436
+ "model.layers.48.mlp.shared_experts.up_proj": {
1437
+ "bits": 4,
1438
+ "group_size": 128
1439
+ },
1440
+ "model.layers.48.self_attn.kv_a_proj_with_mqa": {
1441
+ "bits": 4,
1442
+ "group_size": 128
1443
+ },
1444
+ "model.layers.48.self_attn.kv_b_proj": {
1445
+ "bits": 4,
1446
+ "group_size": 128
1447
+ },
1448
+ "model.layers.48.self_attn.o_proj": {
1449
+ "bits": 4,
1450
+ "group_size": 128
1451
+ },
1452
+ "model.layers.48.self_attn.q_a_proj": {
1453
+ "bits": 4,
1454
+ "group_size": 128
1455
+ },
1456
+ "model.layers.48.self_attn.q_b_proj": {
1457
+ "bits": 4,
1458
+ "group_size": 128
1459
+ },
1460
+ "model.layers.49.mlp.shared_experts.down_proj": {
1461
+ "bits": 4,
1462
+ "group_size": 128
1463
+ },
1464
+ "model.layers.49.mlp.shared_experts.gate_proj": {
1465
+ "bits": 4,
1466
+ "group_size": 128
1467
+ },
1468
+ "model.layers.49.mlp.shared_experts.up_proj": {
1469
+ "bits": 4,
1470
+ "group_size": 128
1471
+ },
1472
+ "model.layers.49.self_attn.kv_a_proj_with_mqa": {
1473
+ "bits": 4,
1474
+ "group_size": 128
1475
+ },
1476
+ "model.layers.49.self_attn.kv_b_proj": {
1477
+ "bits": 4,
1478
+ "group_size": 128
1479
+ },
1480
+ "model.layers.49.self_attn.o_proj": {
1481
+ "bits": 4,
1482
+ "group_size": 128
1483
+ },
1484
+ "model.layers.49.self_attn.q_a_proj": {
1485
+ "bits": 4,
1486
+ "group_size": 128
1487
+ },
1488
+ "model.layers.49.self_attn.q_b_proj": {
1489
+ "bits": 4,
1490
+ "group_size": 128
1491
+ },
1492
+ "model.layers.5.mlp.shared_experts.down_proj": {
1493
+ "bits": 4,
1494
+ "group_size": 128
1495
+ },
1496
+ "model.layers.5.mlp.shared_experts.gate_proj": {
1497
+ "bits": 4,
1498
+ "group_size": 128
1499
+ },
1500
+ "model.layers.5.mlp.shared_experts.up_proj": {
1501
+ "bits": 4,
1502
+ "group_size": 128
1503
+ },
1504
+ "model.layers.5.self_attn.kv_a_proj_with_mqa": {
1505
+ "bits": 4,
1506
+ "group_size": 128
1507
+ },
1508
+ "model.layers.5.self_attn.kv_b_proj": {
1509
+ "bits": 4,
1510
+ "group_size": 128
1511
+ },
1512
+ "model.layers.5.self_attn.o_proj": {
1513
+ "bits": 4,
1514
+ "group_size": 128
1515
+ },
1516
+ "model.layers.5.self_attn.q_a_proj": {
1517
+ "bits": 4,
1518
+ "group_size": 128
1519
+ },
1520
+ "model.layers.5.self_attn.q_b_proj": {
1521
+ "bits": 4,
1522
+ "group_size": 128
1523
+ },
1524
+ "model.layers.50.mlp.shared_experts.down_proj": {
1525
+ "bits": 4,
1526
+ "group_size": 128
1527
+ },
1528
+ "model.layers.50.mlp.shared_experts.gate_proj": {
1529
+ "bits": 4,
1530
+ "group_size": 128
1531
+ },
1532
+ "model.layers.50.mlp.shared_experts.up_proj": {
1533
+ "bits": 4,
1534
+ "group_size": 128
1535
+ },
1536
+ "model.layers.50.self_attn.kv_a_proj_with_mqa": {
1537
+ "bits": 4,
1538
+ "group_size": 128
1539
+ },
1540
+ "model.layers.50.self_attn.kv_b_proj": {
1541
+ "bits": 4,
1542
+ "group_size": 128
1543
+ },
1544
+ "model.layers.50.self_attn.o_proj": {
1545
+ "bits": 4,
1546
+ "group_size": 128
1547
+ },
1548
+ "model.layers.50.self_attn.q_a_proj": {
1549
+ "bits": 4,
1550
+ "group_size": 128
1551
+ },
1552
+ "model.layers.50.self_attn.q_b_proj": {
1553
+ "bits": 4,
1554
+ "group_size": 128
1555
+ },
1556
+ "model.layers.51.mlp.shared_experts.down_proj": {
1557
+ "bits": 4,
1558
+ "group_size": 128
1559
+ },
1560
+ "model.layers.51.mlp.shared_experts.gate_proj": {
1561
+ "bits": 4,
1562
+ "group_size": 128
1563
+ },
1564
+ "model.layers.51.mlp.shared_experts.up_proj": {
1565
+ "bits": 4,
1566
+ "group_size": 128
1567
+ },
1568
+ "model.layers.51.self_attn.kv_a_proj_with_mqa": {
1569
+ "bits": 4,
1570
+ "group_size": 128
1571
+ },
1572
+ "model.layers.51.self_attn.kv_b_proj": {
1573
+ "bits": 4,
1574
+ "group_size": 128
1575
+ },
1576
+ "model.layers.51.self_attn.o_proj": {
1577
+ "bits": 4,
1578
+ "group_size": 128
1579
+ },
1580
+ "model.layers.51.self_attn.q_a_proj": {
1581
+ "bits": 4,
1582
+ "group_size": 128
1583
+ },
1584
+ "model.layers.51.self_attn.q_b_proj": {
1585
+ "bits": 4,
1586
+ "group_size": 128
1587
+ },
1588
+ "model.layers.52.mlp.shared_experts.down_proj": {
1589
+ "bits": 4,
1590
+ "group_size": 128
1591
+ },
1592
+ "model.layers.52.mlp.shared_experts.gate_proj": {
1593
+ "bits": 4,
1594
+ "group_size": 128
1595
+ },
1596
+ "model.layers.52.mlp.shared_experts.up_proj": {
1597
+ "bits": 4,
1598
+ "group_size": 128
1599
+ },
1600
+ "model.layers.52.self_attn.kv_a_proj_with_mqa": {
1601
+ "bits": 4,
1602
+ "group_size": 128
1603
+ },
1604
+ "model.layers.52.self_attn.kv_b_proj": {
1605
+ "bits": 4,
1606
+ "group_size": 128
1607
+ },
1608
+ "model.layers.52.self_attn.o_proj": {
1609
+ "bits": 4,
1610
+ "group_size": 128
1611
+ },
1612
+ "model.layers.52.self_attn.q_a_proj": {
1613
+ "bits": 4,
1614
+ "group_size": 128
1615
+ },
1616
+ "model.layers.52.self_attn.q_b_proj": {
1617
+ "bits": 4,
1618
+ "group_size": 128
1619
+ },
1620
+ "model.layers.53.mlp.shared_experts.down_proj": {
1621
+ "bits": 4,
1622
+ "group_size": 128
1623
+ },
1624
+ "model.layers.53.mlp.shared_experts.gate_proj": {
1625
+ "bits": 4,
1626
+ "group_size": 128
1627
+ },
1628
+ "model.layers.53.mlp.shared_experts.up_proj": {
1629
+ "bits": 4,
1630
+ "group_size": 128
1631
+ },
1632
+ "model.layers.53.self_attn.kv_a_proj_with_mqa": {
1633
+ "bits": 4,
1634
+ "group_size": 128
1635
+ },
1636
+ "model.layers.53.self_attn.kv_b_proj": {
1637
+ "bits": 4,
1638
+ "group_size": 128
1639
+ },
1640
+ "model.layers.53.self_attn.o_proj": {
1641
+ "bits": 4,
1642
+ "group_size": 128
1643
+ },
1644
+ "model.layers.53.self_attn.q_a_proj": {
1645
+ "bits": 4,
1646
+ "group_size": 128
1647
+ },
1648
+ "model.layers.53.self_attn.q_b_proj": {
1649
+ "bits": 4,
1650
+ "group_size": 128
1651
+ },
1652
+ "model.layers.54.mlp.shared_experts.down_proj": {
1653
+ "bits": 4,
1654
+ "group_size": 128
1655
+ },
1656
+ "model.layers.54.mlp.shared_experts.gate_proj": {
1657
+ "bits": 4,
1658
+ "group_size": 128
1659
+ },
1660
+ "model.layers.54.mlp.shared_experts.up_proj": {
1661
+ "bits": 4,
1662
+ "group_size": 128
1663
+ },
1664
+ "model.layers.54.self_attn.kv_a_proj_with_mqa": {
1665
+ "bits": 4,
1666
+ "group_size": 128
1667
+ },
1668
+ "model.layers.54.self_attn.kv_b_proj": {
1669
+ "bits": 4,
1670
+ "group_size": 128
1671
+ },
1672
+ "model.layers.54.self_attn.o_proj": {
1673
+ "bits": 4,
1674
+ "group_size": 128
1675
+ },
1676
+ "model.layers.54.self_attn.q_a_proj": {
1677
+ "bits": 4,
1678
+ "group_size": 128
1679
+ },
1680
+ "model.layers.54.self_attn.q_b_proj": {
1681
+ "bits": 4,
1682
+ "group_size": 128
1683
+ },
1684
+ "model.layers.55.mlp.shared_experts.down_proj": {
1685
+ "bits": 4,
1686
+ "group_size": 128
1687
+ },
1688
+ "model.layers.55.mlp.shared_experts.gate_proj": {
1689
+ "bits": 4,
1690
+ "group_size": 128
1691
+ },
1692
+ "model.layers.55.mlp.shared_experts.up_proj": {
1693
+ "bits": 4,
1694
+ "group_size": 128
1695
+ },
1696
+ "model.layers.55.self_attn.kv_a_proj_with_mqa": {
1697
+ "bits": 4,
1698
+ "group_size": 128
1699
+ },
1700
+ "model.layers.55.self_attn.kv_b_proj": {
1701
+ "bits": 4,
1702
+ "group_size": 128
1703
+ },
1704
+ "model.layers.55.self_attn.o_proj": {
1705
+ "bits": 4,
1706
+ "group_size": 128
1707
+ },
1708
+ "model.layers.55.self_attn.q_a_proj": {
1709
+ "bits": 4,
1710
+ "group_size": 128
1711
+ },
1712
+ "model.layers.55.self_attn.q_b_proj": {
1713
+ "bits": 4,
1714
+ "group_size": 128
1715
+ },
1716
+ "model.layers.56.mlp.shared_experts.down_proj": {
1717
+ "bits": 4,
1718
+ "group_size": 128
1719
+ },
1720
+ "model.layers.56.mlp.shared_experts.gate_proj": {
1721
+ "bits": 4,
1722
+ "group_size": 128
1723
+ },
1724
+ "model.layers.56.mlp.shared_experts.up_proj": {
1725
+ "bits": 4,
1726
+ "group_size": 128
1727
+ },
1728
+ "model.layers.56.self_attn.kv_a_proj_with_mqa": {
1729
+ "bits": 4,
1730
+ "group_size": 128
1731
+ },
1732
+ "model.layers.56.self_attn.kv_b_proj": {
1733
+ "bits": 4,
1734
+ "group_size": 128
1735
+ },
1736
+ "model.layers.56.self_attn.o_proj": {
1737
+ "bits": 4,
1738
+ "group_size": 128
1739
+ },
1740
+ "model.layers.56.self_attn.q_a_proj": {
1741
+ "bits": 4,
1742
+ "group_size": 128
1743
+ },
1744
+ "model.layers.56.self_attn.q_b_proj": {
1745
+ "bits": 4,
1746
+ "group_size": 128
1747
+ },
1748
+ "model.layers.57.mlp.shared_experts.down_proj": {
1749
+ "bits": 4,
1750
+ "group_size": 128
1751
+ },
1752
+ "model.layers.57.mlp.shared_experts.gate_proj": {
1753
+ "bits": 4,
1754
+ "group_size": 128
1755
+ },
1756
+ "model.layers.57.mlp.shared_experts.up_proj": {
1757
+ "bits": 4,
1758
+ "group_size": 128
1759
+ },
1760
+ "model.layers.57.self_attn.kv_a_proj_with_mqa": {
1761
+ "bits": 4,
1762
+ "group_size": 128
1763
+ },
1764
+ "model.layers.57.self_attn.kv_b_proj": {
1765
+ "bits": 4,
1766
+ "group_size": 128
1767
+ },
1768
+ "model.layers.57.self_attn.o_proj": {
1769
+ "bits": 4,
1770
+ "group_size": 128
1771
+ },
1772
+ "model.layers.57.self_attn.q_a_proj": {
1773
+ "bits": 4,
1774
+ "group_size": 128
1775
+ },
1776
+ "model.layers.57.self_attn.q_b_proj": {
1777
+ "bits": 4,
1778
+ "group_size": 128
1779
+ },
1780
+ "model.layers.58.mlp.shared_experts.down_proj": {
1781
+ "bits": 4,
1782
+ "group_size": 128
1783
+ },
1784
+ "model.layers.58.mlp.shared_experts.gate_proj": {
1785
+ "bits": 4,
1786
+ "group_size": 128
1787
+ },
1788
+ "model.layers.58.mlp.shared_experts.up_proj": {
1789
+ "bits": 4,
1790
+ "group_size": 128
1791
+ },
1792
+ "model.layers.58.self_attn.kv_a_proj_with_mqa": {
1793
+ "bits": 4,
1794
+ "group_size": 128
1795
+ },
1796
+ "model.layers.58.self_attn.kv_b_proj": {
1797
+ "bits": 4,
1798
+ "group_size": 128
1799
+ },
1800
+ "model.layers.58.self_attn.o_proj": {
1801
+ "bits": 4,
1802
+ "group_size": 128
1803
+ },
1804
+ "model.layers.58.self_attn.q_a_proj": {
1805
+ "bits": 4,
1806
+ "group_size": 128
1807
+ },
1808
+ "model.layers.58.self_attn.q_b_proj": {
1809
+ "bits": 4,
1810
+ "group_size": 128
1811
+ },
1812
+ "model.layers.59.mlp.shared_experts.down_proj": {
1813
+ "bits": 4,
1814
+ "group_size": 128
1815
+ },
1816
+ "model.layers.59.mlp.shared_experts.gate_proj": {
1817
+ "bits": 4,
1818
+ "group_size": 128
1819
+ },
1820
+ "model.layers.59.mlp.shared_experts.up_proj": {
1821
+ "bits": 4,
1822
+ "group_size": 128
1823
+ },
1824
+ "model.layers.59.self_attn.kv_a_proj_with_mqa": {
1825
+ "bits": 4,
1826
+ "group_size": 128
1827
+ },
1828
+ "model.layers.59.self_attn.kv_b_proj": {
1829
+ "bits": 4,
1830
+ "group_size": 128
1831
+ },
1832
+ "model.layers.59.self_attn.o_proj": {
1833
+ "bits": 4,
1834
+ "group_size": 128
1835
+ },
1836
+ "model.layers.59.self_attn.q_a_proj": {
1837
+ "bits": 4,
1838
+ "group_size": 128
1839
+ },
1840
+ "model.layers.59.self_attn.q_b_proj": {
1841
+ "bits": 4,
1842
+ "group_size": 128
1843
+ },
1844
+ "model.layers.6.mlp.shared_experts.down_proj": {
1845
+ "bits": 4,
1846
+ "group_size": 128
1847
+ },
1848
+ "model.layers.6.mlp.shared_experts.gate_proj": {
1849
+ "bits": 4,
1850
+ "group_size": 128
1851
+ },
1852
+ "model.layers.6.mlp.shared_experts.up_proj": {
1853
+ "bits": 4,
1854
+ "group_size": 128
1855
+ },
1856
+ "model.layers.6.self_attn.kv_a_proj_with_mqa": {
1857
+ "bits": 4,
1858
+ "group_size": 128
1859
+ },
1860
+ "model.layers.6.self_attn.kv_b_proj": {
1861
+ "bits": 4,
1862
+ "group_size": 128
1863
+ },
1864
+ "model.layers.6.self_attn.o_proj": {
1865
+ "bits": 4,
1866
+ "group_size": 128
1867
+ },
1868
+ "model.layers.6.self_attn.q_a_proj": {
1869
+ "bits": 4,
1870
+ "group_size": 128
1871
+ },
1872
+ "model.layers.6.self_attn.q_b_proj": {
1873
+ "bits": 4,
1874
+ "group_size": 128
1875
+ },
1876
+ "model.layers.60.mlp.experts.0.down_proj": {
1877
+ "bits": 16,
1878
+ "data_type": "bfloat"
1879
+ },
1880
+ "model.layers.60.mlp.experts.1.down_proj": {
1881
+ "bits": 16,
1882
+ "data_type": "bfloat"
1883
+ },
1884
+ "model.layers.60.mlp.experts.10.down_proj": {
1885
+ "bits": 16,
1886
+ "data_type": "bfloat"
1887
+ },
1888
+ "model.layers.60.mlp.experts.100.down_proj": {
1889
+ "bits": 16,
1890
+ "data_type": "bfloat"
1891
+ },
1892
+ "model.layers.60.mlp.experts.101.down_proj": {
1893
+ "bits": 16,
1894
+ "data_type": "bfloat"
1895
+ },
1896
+ "model.layers.60.mlp.experts.102.down_proj": {
1897
+ "bits": 16,
1898
+ "data_type": "bfloat"
1899
+ },
1900
+ "model.layers.60.mlp.experts.103.down_proj": {
1901
+ "bits": 16,
1902
+ "data_type": "bfloat"
1903
+ },
1904
+ "model.layers.60.mlp.experts.104.down_proj": {
1905
+ "bits": 16,
1906
+ "data_type": "bfloat"
1907
+ },
1908
+ "model.layers.60.mlp.experts.105.down_proj": {
1909
+ "bits": 16,
1910
+ "data_type": "bfloat"
1911
+ },
1912
+ "model.layers.60.mlp.experts.106.down_proj": {
1913
+ "bits": 16,
1914
+ "data_type": "bfloat"
1915
+ },
1916
+ "model.layers.60.mlp.experts.107.down_proj": {
1917
+ "bits": 16,
1918
+ "data_type": "bfloat"
1919
+ },
1920
+ "model.layers.60.mlp.experts.108.down_proj": {
1921
+ "bits": 16,
1922
+ "data_type": "bfloat"
1923
+ },
1924
+ "model.layers.60.mlp.experts.109.down_proj": {
1925
+ "bits": 16,
1926
+ "data_type": "bfloat"
1927
+ },
1928
+ "model.layers.60.mlp.experts.11.down_proj": {
1929
+ "bits": 16,
1930
+ "data_type": "bfloat"
1931
+ },
1932
+ "model.layers.60.mlp.experts.110.down_proj": {
1933
+ "bits": 16,
1934
+ "data_type": "bfloat"
1935
+ },
1936
+ "model.layers.60.mlp.experts.111.down_proj": {
1937
+ "bits": 16,
1938
+ "data_type": "bfloat"
1939
+ },
1940
+ "model.layers.60.mlp.experts.112.down_proj": {
1941
+ "bits": 16,
1942
+ "data_type": "bfloat"
1943
+ },
1944
+ "model.layers.60.mlp.experts.113.down_proj": {
1945
+ "bits": 16,
1946
+ "data_type": "bfloat"
1947
+ },
1948
+ "model.layers.60.mlp.experts.114.down_proj": {
1949
+ "bits": 16,
1950
+ "data_type": "bfloat"
1951
+ },
1952
+ "model.layers.60.mlp.experts.115.down_proj": {
1953
+ "bits": 16,
1954
+ "data_type": "bfloat"
1955
+ },
1956
+ "model.layers.60.mlp.experts.116.down_proj": {
1957
+ "bits": 16,
1958
+ "data_type": "bfloat"
1959
+ },
1960
+ "model.layers.60.mlp.experts.117.down_proj": {
1961
+ "bits": 16,
1962
+ "data_type": "bfloat"
1963
+ },
1964
+ "model.layers.60.mlp.experts.118.down_proj": {
1965
+ "bits": 16,
1966
+ "data_type": "bfloat"
1967
+ },
1968
+ "model.layers.60.mlp.experts.119.down_proj": {
1969
+ "bits": 16,
1970
+ "data_type": "bfloat"
1971
+ },
1972
+ "model.layers.60.mlp.experts.12.down_proj": {
1973
+ "bits": 16,
1974
+ "data_type": "bfloat"
1975
+ },
1976
+ "model.layers.60.mlp.experts.120.down_proj": {
1977
+ "bits": 16,
1978
+ "data_type": "bfloat"
1979
+ },
1980
+ "model.layers.60.mlp.experts.121.down_proj": {
1981
+ "bits": 16,
1982
+ "data_type": "bfloat"
1983
+ },
1984
+ "model.layers.60.mlp.experts.122.down_proj": {
1985
+ "bits": 16,
1986
+ "data_type": "bfloat"
1987
+ },
1988
+ "model.layers.60.mlp.experts.123.down_proj": {
1989
+ "bits": 16,
1990
+ "data_type": "bfloat"
1991
+ },
1992
+ "model.layers.60.mlp.experts.124.down_proj": {
1993
+ "bits": 16,
1994
+ "data_type": "bfloat"
1995
+ },
1996
+ "model.layers.60.mlp.experts.125.down_proj": {
1997
+ "bits": 16,
1998
+ "data_type": "bfloat"
1999
+ },
2000
+ "model.layers.60.mlp.experts.126.down_proj": {
2001
+ "bits": 16,
2002
+ "data_type": "bfloat"
2003
+ },
2004
+ "model.layers.60.mlp.experts.127.down_proj": {
2005
+ "bits": 16,
2006
+ "data_type": "bfloat"
2007
+ },
2008
+ "model.layers.60.mlp.experts.128.down_proj": {
2009
+ "bits": 16,
2010
+ "data_type": "bfloat"
2011
+ },
2012
+ "model.layers.60.mlp.experts.129.down_proj": {
2013
+ "bits": 16,
2014
+ "data_type": "bfloat"
2015
+ },
2016
+ "model.layers.60.mlp.experts.13.down_proj": {
2017
+ "bits": 16,
2018
+ "data_type": "bfloat"
2019
+ },
2020
+ "model.layers.60.mlp.experts.130.down_proj": {
2021
+ "bits": 16,
2022
+ "data_type": "bfloat"
2023
+ },
2024
+ "model.layers.60.mlp.experts.131.down_proj": {
2025
+ "bits": 16,
2026
+ "data_type": "bfloat"
2027
+ },
2028
+ "model.layers.60.mlp.experts.132.down_proj": {
2029
+ "bits": 16,
2030
+ "data_type": "bfloat"
2031
+ },
2032
+ "model.layers.60.mlp.experts.133.down_proj": {
2033
+ "bits": 16,
2034
+ "data_type": "bfloat"
2035
+ },
2036
+ "model.layers.60.mlp.experts.134.down_proj": {
2037
+ "bits": 16,
2038
+ "data_type": "bfloat"
2039
+ },
2040
+ "model.layers.60.mlp.experts.135.down_proj": {
2041
+ "bits": 16,
2042
+ "data_type": "bfloat"
2043
+ },
2044
+ "model.layers.60.mlp.experts.136.down_proj": {
2045
+ "bits": 16,
2046
+ "data_type": "bfloat"
2047
+ },
2048
+ "model.layers.60.mlp.experts.137.down_proj": {
2049
+ "bits": 16,
2050
+ "data_type": "bfloat"
2051
+ },
2052
+ "model.layers.60.mlp.experts.138.down_proj": {
2053
+ "bits": 16,
2054
+ "data_type": "bfloat"
2055
+ },
2056
+ "model.layers.60.mlp.experts.139.down_proj": {
2057
+ "bits": 16,
2058
+ "data_type": "bfloat"
2059
+ },
2060
+ "model.layers.60.mlp.experts.14.down_proj": {
2061
+ "bits": 16,
2062
+ "data_type": "bfloat"
2063
+ },
2064
+ "model.layers.60.mlp.experts.140.down_proj": {
2065
+ "bits": 16,
2066
+ "data_type": "bfloat"
2067
+ },
2068
+ "model.layers.60.mlp.experts.141.down_proj": {
2069
+ "bits": 16,
2070
+ "data_type": "bfloat"
2071
+ },
2072
+ "model.layers.60.mlp.experts.142.down_proj": {
2073
+ "bits": 16,
2074
+ "data_type": "bfloat"
2075
+ },
2076
+ "model.layers.60.mlp.experts.143.down_proj": {
2077
+ "bits": 16,
2078
+ "data_type": "bfloat"
2079
+ },
2080
+ "model.layers.60.mlp.experts.144.down_proj": {
2081
+ "bits": 16,
2082
+ "data_type": "bfloat"
2083
+ },
2084
+ "model.layers.60.mlp.experts.145.down_proj": {
2085
+ "bits": 16,
2086
+ "data_type": "bfloat"
2087
+ },
2088
+ "model.layers.60.mlp.experts.146.down_proj": {
2089
+ "bits": 16,
2090
+ "data_type": "bfloat"
2091
+ },
2092
+ "model.layers.60.mlp.experts.147.down_proj": {
2093
+ "bits": 16,
2094
+ "data_type": "bfloat"
2095
+ },
2096
+ "model.layers.60.mlp.experts.148.down_proj": {
2097
+ "bits": 16,
2098
+ "data_type": "bfloat"
2099
+ },
2100
+ "model.layers.60.mlp.experts.149.down_proj": {
2101
+ "bits": 16,
2102
+ "data_type": "bfloat"
2103
+ },
2104
+ "model.layers.60.mlp.experts.15.down_proj": {
2105
+ "bits": 16,
2106
+ "data_type": "bfloat"
2107
+ },
2108
+ "model.layers.60.mlp.experts.150.down_proj": {
2109
+ "bits": 16,
2110
+ "data_type": "bfloat"
2111
+ },
2112
+ "model.layers.60.mlp.experts.151.down_proj": {
2113
+ "bits": 16,
2114
+ "data_type": "bfloat"
2115
+ },
2116
+ "model.layers.60.mlp.experts.152.down_proj": {
2117
+ "bits": 16,
2118
+ "data_type": "bfloat"
2119
+ },
2120
+ "model.layers.60.mlp.experts.153.down_proj": {
2121
+ "bits": 16,
2122
+ "data_type": "bfloat"
2123
+ },
2124
+ "model.layers.60.mlp.experts.154.down_proj": {
2125
+ "bits": 16,
2126
+ "data_type": "bfloat"
2127
+ },
2128
+ "model.layers.60.mlp.experts.155.down_proj": {
2129
+ "bits": 16,
2130
+ "data_type": "bfloat"
2131
+ },
2132
+ "model.layers.60.mlp.experts.156.down_proj": {
2133
+ "bits": 16,
2134
+ "data_type": "bfloat"
2135
+ },
2136
+ "model.layers.60.mlp.experts.157.down_proj": {
2137
+ "bits": 16,
2138
+ "data_type": "bfloat"
2139
+ },
2140
+ "model.layers.60.mlp.experts.158.down_proj": {
2141
+ "bits": 16,
2142
+ "data_type": "bfloat"
2143
+ },
2144
+ "model.layers.60.mlp.experts.159.down_proj": {
2145
+ "bits": 16,
2146
+ "data_type": "bfloat"
2147
+ },
2148
+ "model.layers.60.mlp.experts.16.down_proj": {
2149
+ "bits": 16,
2150
+ "data_type": "bfloat"
2151
+ },
2152
+ "model.layers.60.mlp.experts.160.down_proj": {
2153
+ "bits": 16,
2154
+ "data_type": "bfloat"
2155
+ },
2156
+ "model.layers.60.mlp.experts.161.down_proj": {
2157
+ "bits": 16,
2158
+ "data_type": "bfloat"
2159
+ },
2160
+ "model.layers.60.mlp.experts.162.down_proj": {
2161
+ "bits": 16,
2162
+ "data_type": "bfloat"
2163
+ },
2164
+ "model.layers.60.mlp.experts.163.down_proj": {
2165
+ "bits": 16,
2166
+ "data_type": "bfloat"
2167
+ },
2168
+ "model.layers.60.mlp.experts.164.down_proj": {
2169
+ "bits": 16,
2170
+ "data_type": "bfloat"
2171
+ },
2172
+ "model.layers.60.mlp.experts.165.down_proj": {
2173
+ "bits": 16,
2174
+ "data_type": "bfloat"
2175
+ },
2176
+ "model.layers.60.mlp.experts.166.down_proj": {
2177
+ "bits": 16,
2178
+ "data_type": "bfloat"
2179
+ },
2180
+ "model.layers.60.mlp.experts.167.down_proj": {
2181
+ "bits": 16,
2182
+ "data_type": "bfloat"
2183
+ },
2184
+ "model.layers.60.mlp.experts.168.down_proj": {
2185
+ "bits": 16,
2186
+ "data_type": "bfloat"
2187
+ },
2188
+ "model.layers.60.mlp.experts.169.down_proj": {
2189
+ "bits": 16,
2190
+ "data_type": "bfloat"
2191
+ },
2192
+ "model.layers.60.mlp.experts.17.down_proj": {
2193
+ "bits": 16,
2194
+ "data_type": "bfloat"
2195
+ },
2196
+ "model.layers.60.mlp.experts.170.down_proj": {
2197
+ "bits": 16,
2198
+ "data_type": "bfloat"
2199
+ },
2200
+ "model.layers.60.mlp.experts.171.down_proj": {
2201
+ "bits": 16,
2202
+ "data_type": "bfloat"
2203
+ },
2204
+ "model.layers.60.mlp.experts.172.down_proj": {
2205
+ "bits": 16,
2206
+ "data_type": "bfloat"
2207
+ },
2208
+ "model.layers.60.mlp.experts.173.down_proj": {
2209
+ "bits": 16,
2210
+ "data_type": "bfloat"
2211
+ },
2212
+ "model.layers.60.mlp.experts.174.down_proj": {
2213
+ "bits": 16,
2214
+ "data_type": "bfloat"
2215
+ },
2216
+ "model.layers.60.mlp.experts.175.down_proj": {
2217
+ "bits": 16,
2218
+ "data_type": "bfloat"
2219
+ },
2220
+ "model.layers.60.mlp.experts.176.down_proj": {
2221
+ "bits": 16,
2222
+ "data_type": "bfloat"
2223
+ },
2224
+ "model.layers.60.mlp.experts.177.down_proj": {
2225
+ "bits": 16,
2226
+ "data_type": "bfloat"
2227
+ },
2228
+ "model.layers.60.mlp.experts.178.down_proj": {
2229
+ "bits": 16,
2230
+ "data_type": "bfloat"
2231
+ },
2232
+ "model.layers.60.mlp.experts.179.down_proj": {
2233
+ "bits": 16,
2234
+ "data_type": "bfloat"
2235
+ },
2236
+ "model.layers.60.mlp.experts.18.down_proj": {
2237
+ "bits": 16,
2238
+ "data_type": "bfloat"
2239
+ },
2240
+ "model.layers.60.mlp.experts.180.down_proj": {
2241
+ "bits": 16,
2242
+ "data_type": "bfloat"
2243
+ },
2244
+ "model.layers.60.mlp.experts.181.down_proj": {
2245
+ "bits": 16,
2246
+ "data_type": "bfloat"
2247
+ },
2248
+ "model.layers.60.mlp.experts.182.down_proj": {
2249
+ "bits": 16,
2250
+ "data_type": "bfloat"
2251
+ },
2252
+ "model.layers.60.mlp.experts.183.down_proj": {
2253
+ "bits": 16,
2254
+ "data_type": "bfloat"
2255
+ },
2256
+ "model.layers.60.mlp.experts.184.down_proj": {
2257
+ "bits": 16,
2258
+ "data_type": "bfloat"
2259
+ },
2260
+ "model.layers.60.mlp.experts.185.down_proj": {
2261
+ "bits": 16,
2262
+ "data_type": "bfloat"
2263
+ },
2264
+ "model.layers.60.mlp.experts.186.down_proj": {
2265
+ "bits": 16,
2266
+ "data_type": "bfloat"
2267
+ },
2268
+ "model.layers.60.mlp.experts.187.down_proj": {
2269
+ "bits": 16,
2270
+ "data_type": "bfloat"
2271
+ },
2272
+ "model.layers.60.mlp.experts.188.down_proj": {
2273
+ "bits": 16,
2274
+ "data_type": "bfloat"
2275
+ },
2276
+ "model.layers.60.mlp.experts.189.down_proj": {
2277
+ "bits": 16,
2278
+ "data_type": "bfloat"
2279
+ },
2280
+ "model.layers.60.mlp.experts.19.down_proj": {
2281
+ "bits": 16,
2282
+ "data_type": "bfloat"
2283
+ },
2284
+ "model.layers.60.mlp.experts.190.down_proj": {
2285
+ "bits": 16,
2286
+ "data_type": "bfloat"
2287
+ },
2288
+ "model.layers.60.mlp.experts.191.down_proj": {
2289
+ "bits": 16,
2290
+ "data_type": "bfloat"
2291
+ },
2292
+ "model.layers.60.mlp.experts.192.down_proj": {
2293
+ "bits": 16,
2294
+ "data_type": "bfloat"
2295
+ },
2296
+ "model.layers.60.mlp.experts.193.down_proj": {
2297
+ "bits": 16,
2298
+ "data_type": "bfloat"
2299
+ },
2300
+ "model.layers.60.mlp.experts.194.down_proj": {
2301
+ "bits": 16,
2302
+ "data_type": "bfloat"
2303
+ },
2304
+ "model.layers.60.mlp.experts.195.down_proj": {
2305
+ "bits": 16,
2306
+ "data_type": "bfloat"
2307
+ },
2308
+ "model.layers.60.mlp.experts.196.down_proj": {
2309
+ "bits": 16,
2310
+ "data_type": "bfloat"
2311
+ },
2312
+ "model.layers.60.mlp.experts.197.down_proj": {
2313
+ "bits": 16,
2314
+ "data_type": "bfloat"
2315
+ },
2316
+ "model.layers.60.mlp.experts.198.down_proj": {
2317
+ "bits": 16,
2318
+ "data_type": "bfloat"
2319
+ },
2320
+ "model.layers.60.mlp.experts.199.down_proj": {
2321
+ "bits": 16,
2322
+ "data_type": "bfloat"
2323
+ },
2324
+ "model.layers.60.mlp.experts.2.down_proj": {
2325
+ "bits": 16,
2326
+ "data_type": "bfloat"
2327
+ },
2328
+ "model.layers.60.mlp.experts.20.down_proj": {
2329
+ "bits": 16,
2330
+ "data_type": "bfloat"
2331
+ },
2332
+ "model.layers.60.mlp.experts.200.down_proj": {
2333
+ "bits": 16,
2334
+ "data_type": "bfloat"
2335
+ },
2336
+ "model.layers.60.mlp.experts.201.down_proj": {
2337
+ "bits": 16,
2338
+ "data_type": "bfloat"
2339
+ },
2340
+ "model.layers.60.mlp.experts.202.down_proj": {
2341
+ "bits": 16,
2342
+ "data_type": "bfloat"
2343
+ },
2344
+ "model.layers.60.mlp.experts.203.down_proj": {
2345
+ "bits": 16,
2346
+ "data_type": "bfloat"
2347
+ },
2348
+ "model.layers.60.mlp.experts.204.down_proj": {
2349
+ "bits": 16,
2350
+ "data_type": "bfloat"
2351
+ },
2352
+ "model.layers.60.mlp.experts.205.down_proj": {
2353
+ "bits": 16,
2354
+ "data_type": "bfloat"
2355
+ },
2356
+ "model.layers.60.mlp.experts.206.down_proj": {
2357
+ "bits": 16,
2358
+ "data_type": "bfloat"
2359
+ },
2360
+ "model.layers.60.mlp.experts.207.down_proj": {
2361
+ "bits": 16,
2362
+ "data_type": "bfloat"
2363
+ },
2364
+ "model.layers.60.mlp.experts.208.down_proj": {
2365
+ "bits": 16,
2366
+ "data_type": "bfloat"
2367
+ },
2368
+ "model.layers.60.mlp.experts.209.down_proj": {
2369
+ "bits": 16,
2370
+ "data_type": "bfloat"
2371
+ },
2372
+ "model.layers.60.mlp.experts.21.down_proj": {
2373
+ "bits": 16,
2374
+ "data_type": "bfloat"
2375
+ },
2376
+ "model.layers.60.mlp.experts.210.down_proj": {
2377
+ "bits": 16,
2378
+ "data_type": "bfloat"
2379
+ },
2380
+ "model.layers.60.mlp.experts.211.down_proj": {
2381
+ "bits": 16,
2382
+ "data_type": "bfloat"
2383
+ },
2384
+ "model.layers.60.mlp.experts.212.down_proj": {
2385
+ "bits": 16,
2386
+ "data_type": "bfloat"
2387
+ },
2388
+ "model.layers.60.mlp.experts.213.down_proj": {
2389
+ "bits": 16,
2390
+ "data_type": "bfloat"
2391
+ },
2392
+ "model.layers.60.mlp.experts.214.down_proj": {
2393
+ "bits": 16,
2394
+ "data_type": "bfloat"
2395
+ },
2396
+ "model.layers.60.mlp.experts.215.down_proj": {
2397
+ "bits": 16,
2398
+ "data_type": "bfloat"
2399
+ },
2400
+ "model.layers.60.mlp.experts.216.down_proj": {
2401
+ "bits": 16,
2402
+ "data_type": "bfloat"
2403
+ },
2404
+ "model.layers.60.mlp.experts.217.down_proj": {
2405
+ "bits": 16,
2406
+ "data_type": "bfloat"
2407
+ },
2408
+ "model.layers.60.mlp.experts.218.down_proj": {
2409
+ "bits": 16,
2410
+ "data_type": "bfloat"
2411
+ },
2412
+ "model.layers.60.mlp.experts.219.down_proj": {
2413
+ "bits": 16,
2414
+ "data_type": "bfloat"
2415
+ },
2416
+ "model.layers.60.mlp.experts.22.down_proj": {
2417
+ "bits": 16,
2418
+ "data_type": "bfloat"
2419
+ },
2420
+ "model.layers.60.mlp.experts.220.down_proj": {
2421
+ "bits": 16,
2422
+ "data_type": "bfloat"
2423
+ },
2424
+ "model.layers.60.mlp.experts.221.down_proj": {
2425
+ "bits": 16,
2426
+ "data_type": "bfloat"
2427
+ },
2428
+ "model.layers.60.mlp.experts.222.down_proj": {
2429
+ "bits": 16,
2430
+ "data_type": "bfloat"
2431
+ },
2432
+ "model.layers.60.mlp.experts.223.down_proj": {
2433
+ "bits": 16,
2434
+ "data_type": "bfloat"
2435
+ },
2436
+ "model.layers.60.mlp.experts.224.down_proj": {
2437
+ "bits": 16,
2438
+ "data_type": "bfloat"
2439
+ },
2440
+ "model.layers.60.mlp.experts.225.down_proj": {
2441
+ "bits": 16,
2442
+ "data_type": "bfloat"
2443
+ },
2444
+ "model.layers.60.mlp.experts.226.down_proj": {
2445
+ "bits": 16,
2446
+ "data_type": "bfloat"
2447
+ },
2448
+ "model.layers.60.mlp.experts.227.down_proj": {
2449
+ "bits": 16,
2450
+ "data_type": "bfloat"
2451
+ },
2452
+ "model.layers.60.mlp.experts.228.down_proj": {
2453
+ "bits": 16,
2454
+ "data_type": "bfloat"
2455
+ },
2456
+ "model.layers.60.mlp.experts.229.down_proj": {
2457
+ "bits": 16,
2458
+ "data_type": "bfloat"
2459
+ },
2460
+ "model.layers.60.mlp.experts.23.down_proj": {
2461
+ "bits": 16,
2462
+ "data_type": "bfloat"
2463
+ },
2464
+ "model.layers.60.mlp.experts.230.down_proj": {
2465
+ "bits": 16,
2466
+ "data_type": "bfloat"
2467
+ },
2468
+ "model.layers.60.mlp.experts.231.down_proj": {
2469
+ "bits": 16,
2470
+ "data_type": "bfloat"
2471
+ },
2472
+ "model.layers.60.mlp.experts.232.down_proj": {
2473
+ "bits": 16,
2474
+ "data_type": "bfloat"
2475
+ },
2476
+ "model.layers.60.mlp.experts.233.down_proj": {
2477
+ "bits": 16,
2478
+ "data_type": "bfloat"
2479
+ },
2480
+ "model.layers.60.mlp.experts.234.down_proj": {
2481
+ "bits": 16,
2482
+ "data_type": "bfloat"
2483
+ },
2484
+ "model.layers.60.mlp.experts.235.down_proj": {
2485
+ "bits": 16,
2486
+ "data_type": "bfloat"
2487
+ },
2488
+ "model.layers.60.mlp.experts.236.down_proj": {
2489
+ "bits": 16,
2490
+ "data_type": "bfloat"
2491
+ },
2492
+ "model.layers.60.mlp.experts.237.down_proj": {
2493
+ "bits": 16,
2494
+ "data_type": "bfloat"
2495
+ },
2496
+ "model.layers.60.mlp.experts.238.down_proj": {
2497
+ "bits": 16,
2498
+ "data_type": "bfloat"
2499
+ },
2500
+ "model.layers.60.mlp.experts.239.down_proj": {
2501
+ "bits": 16,
2502
+ "data_type": "bfloat"
2503
+ },
2504
+ "model.layers.60.mlp.experts.24.down_proj": {
2505
+ "bits": 16,
2506
+ "data_type": "bfloat"
2507
+ },
2508
+ "model.layers.60.mlp.experts.240.down_proj": {
2509
+ "bits": 16,
2510
+ "data_type": "bfloat"
2511
+ },
2512
+ "model.layers.60.mlp.experts.241.down_proj": {
2513
+ "bits": 16,
2514
+ "data_type": "bfloat"
2515
+ },
2516
+ "model.layers.60.mlp.experts.242.down_proj": {
2517
+ "bits": 16,
2518
+ "data_type": "bfloat"
2519
+ },
2520
+ "model.layers.60.mlp.experts.243.down_proj": {
2521
+ "bits": 16,
2522
+ "data_type": "bfloat"
2523
+ },
2524
+ "model.layers.60.mlp.experts.244.down_proj": {
2525
+ "bits": 16,
2526
+ "data_type": "bfloat"
2527
+ },
2528
+ "model.layers.60.mlp.experts.245.down_proj": {
2529
+ "bits": 16,
2530
+ "data_type": "bfloat"
2531
+ },
2532
+ "model.layers.60.mlp.experts.246.down_proj": {
2533
+ "bits": 16,
2534
+ "data_type": "bfloat"
2535
+ },
2536
+ "model.layers.60.mlp.experts.247.down_proj": {
2537
+ "bits": 16,
2538
+ "data_type": "bfloat"
2539
+ },
2540
+ "model.layers.60.mlp.experts.248.down_proj": {
2541
+ "bits": 16,
2542
+ "data_type": "bfloat"
2543
+ },
2544
+ "model.layers.60.mlp.experts.249.down_proj": {
2545
+ "bits": 16,
2546
+ "data_type": "bfloat"
2547
+ },
2548
+ "model.layers.60.mlp.experts.25.down_proj": {
2549
+ "bits": 16,
2550
+ "data_type": "bfloat"
2551
+ },
2552
+ "model.layers.60.mlp.experts.250.down_proj": {
2553
+ "bits": 16,
2554
+ "data_type": "bfloat"
2555
+ },
2556
+ "model.layers.60.mlp.experts.251.down_proj": {
2557
+ "bits": 16,
2558
+ "data_type": "bfloat"
2559
+ },
2560
+ "model.layers.60.mlp.experts.252.down_proj": {
2561
+ "bits": 16,
2562
+ "data_type": "bfloat"
2563
+ },
2564
+ "model.layers.60.mlp.experts.253.down_proj": {
2565
+ "bits": 16,
2566
+ "data_type": "bfloat"
2567
+ },
2568
+ "model.layers.60.mlp.experts.254.down_proj": {
2569
+ "bits": 16,
2570
+ "data_type": "bfloat"
2571
+ },
2572
+ "model.layers.60.mlp.experts.255.down_proj": {
2573
+ "bits": 16,
2574
+ "data_type": "bfloat"
2575
+ },
2576
+ "model.layers.60.mlp.experts.26.down_proj": {
2577
+ "bits": 16,
2578
+ "data_type": "bfloat"
2579
+ },
2580
+ "model.layers.60.mlp.experts.27.down_proj": {
2581
+ "bits": 16,
2582
+ "data_type": "bfloat"
2583
+ },
2584
+ "model.layers.60.mlp.experts.28.down_proj": {
2585
+ "bits": 16,
2586
+ "data_type": "bfloat"
2587
+ },
2588
+ "model.layers.60.mlp.experts.29.down_proj": {
2589
+ "bits": 16,
2590
+ "data_type": "bfloat"
2591
+ },
2592
+ "model.layers.60.mlp.experts.3.down_proj": {
2593
+ "bits": 16,
2594
+ "data_type": "bfloat"
2595
+ },
2596
+ "model.layers.60.mlp.experts.30.down_proj": {
2597
+ "bits": 16,
2598
+ "data_type": "bfloat"
2599
+ },
2600
+ "model.layers.60.mlp.experts.31.down_proj": {
2601
+ "bits": 16,
2602
+ "data_type": "bfloat"
2603
+ },
2604
+ "model.layers.60.mlp.experts.32.down_proj": {
2605
+ "bits": 16,
2606
+ "data_type": "bfloat"
2607
+ },
2608
+ "model.layers.60.mlp.experts.33.down_proj": {
2609
+ "bits": 16,
2610
+ "data_type": "bfloat"
2611
+ },
2612
+ "model.layers.60.mlp.experts.34.down_proj": {
2613
+ "bits": 16,
2614
+ "data_type": "bfloat"
2615
+ },
2616
+ "model.layers.60.mlp.experts.35.down_proj": {
2617
+ "bits": 16,
2618
+ "data_type": "bfloat"
2619
+ },
2620
+ "model.layers.60.mlp.experts.36.down_proj": {
2621
+ "bits": 16,
2622
+ "data_type": "bfloat"
2623
+ },
2624
+ "model.layers.60.mlp.experts.37.down_proj": {
2625
+ "bits": 16,
2626
+ "data_type": "bfloat"
2627
+ },
2628
+ "model.layers.60.mlp.experts.38.down_proj": {
2629
+ "bits": 16,
2630
+ "data_type": "bfloat"
2631
+ },
2632
+ "model.layers.60.mlp.experts.39.down_proj": {
2633
+ "bits": 16,
2634
+ "data_type": "bfloat"
2635
+ },
2636
+ "model.layers.60.mlp.experts.4.down_proj": {
2637
+ "bits": 16,
2638
+ "data_type": "bfloat"
2639
+ },
2640
+ "model.layers.60.mlp.experts.40.down_proj": {
2641
+ "bits": 16,
2642
+ "data_type": "bfloat"
2643
+ },
2644
+ "model.layers.60.mlp.experts.41.down_proj": {
2645
+ "bits": 16,
2646
+ "data_type": "bfloat"
2647
+ },
2648
+ "model.layers.60.mlp.experts.42.down_proj": {
2649
+ "bits": 16,
2650
+ "data_type": "bfloat"
2651
+ },
2652
+ "model.layers.60.mlp.experts.43.down_proj": {
2653
+ "bits": 16,
2654
+ "data_type": "bfloat"
2655
+ },
2656
+ "model.layers.60.mlp.experts.44.down_proj": {
2657
+ "bits": 16,
2658
+ "data_type": "bfloat"
2659
+ },
2660
+ "model.layers.60.mlp.experts.45.down_proj": {
2661
+ "bits": 16,
2662
+ "data_type": "bfloat"
2663
+ },
2664
+ "model.layers.60.mlp.experts.46.down_proj": {
2665
+ "bits": 16,
2666
+ "data_type": "bfloat"
2667
+ },
2668
+ "model.layers.60.mlp.experts.47.down_proj": {
2669
+ "bits": 16,
2670
+ "data_type": "bfloat"
2671
+ },
2672
+ "model.layers.60.mlp.experts.48.down_proj": {
2673
+ "bits": 16,
2674
+ "data_type": "bfloat"
2675
+ },
2676
+ "model.layers.60.mlp.experts.49.down_proj": {
2677
+ "bits": 16,
2678
+ "data_type": "bfloat"
2679
+ },
2680
+ "model.layers.60.mlp.experts.5.down_proj": {
2681
+ "bits": 16,
2682
+ "data_type": "bfloat"
2683
+ },
2684
+ "model.layers.60.mlp.experts.50.down_proj": {
2685
+ "bits": 16,
2686
+ "data_type": "bfloat"
2687
+ },
2688
+ "model.layers.60.mlp.experts.51.down_proj": {
2689
+ "bits": 16,
2690
+ "data_type": "bfloat"
2691
+ },
2692
+ "model.layers.60.mlp.experts.52.down_proj": {
2693
+ "bits": 16,
2694
+ "data_type": "bfloat"
2695
+ },
2696
+ "model.layers.60.mlp.experts.53.down_proj": {
2697
+ "bits": 16,
2698
+ "data_type": "bfloat"
2699
+ },
2700
+ "model.layers.60.mlp.experts.54.down_proj": {
2701
+ "bits": 16,
2702
+ "data_type": "bfloat"
2703
+ },
2704
+ "model.layers.60.mlp.experts.55.down_proj": {
2705
+ "bits": 16,
2706
+ "data_type": "bfloat"
2707
+ },
2708
+ "model.layers.60.mlp.experts.56.down_proj": {
2709
+ "bits": 16,
2710
+ "data_type": "bfloat"
2711
+ },
2712
+ "model.layers.60.mlp.experts.57.down_proj": {
2713
+ "bits": 16,
2714
+ "data_type": "bfloat"
2715
+ },
2716
+ "model.layers.60.mlp.experts.58.down_proj": {
2717
+ "bits": 16,
2718
+ "data_type": "bfloat"
2719
+ },
2720
+ "model.layers.60.mlp.experts.59.down_proj": {
2721
+ "bits": 16,
2722
+ "data_type": "bfloat"
2723
+ },
2724
+ "model.layers.60.mlp.experts.6.down_proj": {
2725
+ "bits": 16,
2726
+ "data_type": "bfloat"
2727
+ },
2728
+ "model.layers.60.mlp.experts.60.down_proj": {
2729
+ "bits": 16,
2730
+ "data_type": "bfloat"
2731
+ },
2732
+ "model.layers.60.mlp.experts.61.down_proj": {
2733
+ "bits": 16,
2734
+ "data_type": "bfloat"
2735
+ },
2736
+ "model.layers.60.mlp.experts.62.down_proj": {
2737
+ "bits": 16,
2738
+ "data_type": "bfloat"
2739
+ },
2740
+ "model.layers.60.mlp.experts.63.down_proj": {
2741
+ "bits": 16,
2742
+ "data_type": "bfloat"
2743
+ },
2744
+ "model.layers.60.mlp.experts.64.down_proj": {
2745
+ "bits": 16,
2746
+ "data_type": "bfloat"
2747
+ },
2748
+ "model.layers.60.mlp.experts.65.down_proj": {
2749
+ "bits": 16,
2750
+ "data_type": "bfloat"
2751
+ },
2752
+ "model.layers.60.mlp.experts.66.down_proj": {
2753
+ "bits": 16,
2754
+ "data_type": "bfloat"
2755
+ },
2756
+ "model.layers.60.mlp.experts.67.down_proj": {
2757
+ "bits": 16,
2758
+ "data_type": "bfloat"
2759
+ },
2760
+ "model.layers.60.mlp.experts.68.down_proj": {
2761
+ "bits": 16,
2762
+ "data_type": "bfloat"
2763
+ },
2764
+ "model.layers.60.mlp.experts.69.down_proj": {
2765
+ "bits": 16,
2766
+ "data_type": "bfloat"
2767
+ },
2768
+ "model.layers.60.mlp.experts.7.down_proj": {
2769
+ "bits": 16,
2770
+ "data_type": "bfloat"
2771
+ },
2772
+ "model.layers.60.mlp.experts.70.down_proj": {
2773
+ "bits": 16,
2774
+ "data_type": "bfloat"
2775
+ },
2776
+ "model.layers.60.mlp.experts.71.down_proj": {
2777
+ "bits": 16,
2778
+ "data_type": "bfloat"
2779
+ },
2780
+ "model.layers.60.mlp.experts.72.down_proj": {
2781
+ "bits": 16,
2782
+ "data_type": "bfloat"
2783
+ },
2784
+ "model.layers.60.mlp.experts.73.down_proj": {
2785
+ "bits": 16,
2786
+ "data_type": "bfloat"
2787
+ },
2788
+ "model.layers.60.mlp.experts.74.down_proj": {
2789
+ "bits": 16,
2790
+ "data_type": "bfloat"
2791
+ },
2792
+ "model.layers.60.mlp.experts.75.down_proj": {
2793
+ "bits": 16,
2794
+ "data_type": "bfloat"
2795
+ },
2796
+ "model.layers.60.mlp.experts.76.down_proj": {
2797
+ "bits": 16,
2798
+ "data_type": "bfloat"
2799
+ },
2800
+ "model.layers.60.mlp.experts.77.down_proj": {
2801
+ "bits": 16,
2802
+ "data_type": "bfloat"
2803
+ },
2804
+ "model.layers.60.mlp.experts.78.down_proj": {
2805
+ "bits": 16,
2806
+ "data_type": "bfloat"
2807
+ },
2808
+ "model.layers.60.mlp.experts.79.down_proj": {
2809
+ "bits": 16,
2810
+ "data_type": "bfloat"
2811
+ },
2812
+ "model.layers.60.mlp.experts.8.down_proj": {
2813
+ "bits": 16,
2814
+ "data_type": "bfloat"
2815
+ },
2816
+ "model.layers.60.mlp.experts.80.down_proj": {
2817
+ "bits": 16,
2818
+ "data_type": "bfloat"
2819
+ },
2820
+ "model.layers.60.mlp.experts.81.down_proj": {
2821
+ "bits": 16,
2822
+ "data_type": "bfloat"
2823
+ },
2824
+ "model.layers.60.mlp.experts.82.down_proj": {
2825
+ "bits": 16,
2826
+ "data_type": "bfloat"
2827
+ },
2828
+ "model.layers.60.mlp.experts.83.down_proj": {
2829
+ "bits": 16,
2830
+ "data_type": "bfloat"
2831
+ },
2832
+ "model.layers.60.mlp.experts.84.down_proj": {
2833
+ "bits": 16,
2834
+ "data_type": "bfloat"
2835
+ },
2836
+ "model.layers.60.mlp.experts.85.down_proj": {
2837
+ "bits": 16,
2838
+ "data_type": "bfloat"
2839
+ },
2840
+ "model.layers.60.mlp.experts.86.down_proj": {
2841
+ "bits": 16,
2842
+ "data_type": "bfloat"
2843
+ },
2844
+ "model.layers.60.mlp.experts.87.down_proj": {
2845
+ "bits": 16,
2846
+ "data_type": "bfloat"
2847
+ },
2848
+ "model.layers.60.mlp.experts.88.down_proj": {
2849
+ "bits": 16,
2850
+ "data_type": "bfloat"
2851
+ },
2852
+ "model.layers.60.mlp.experts.89.down_proj": {
2853
+ "bits": 16,
2854
+ "data_type": "bfloat"
2855
+ },
2856
+ "model.layers.60.mlp.experts.9.down_proj": {
2857
+ "bits": 16,
2858
+ "data_type": "bfloat"
2859
+ },
2860
+ "model.layers.60.mlp.experts.90.down_proj": {
2861
+ "bits": 16,
2862
+ "data_type": "bfloat"
2863
+ },
2864
+ "model.layers.60.mlp.experts.91.down_proj": {
2865
+ "bits": 16,
2866
+ "data_type": "bfloat"
2867
+ },
2868
+ "model.layers.60.mlp.experts.92.down_proj": {
2869
+ "bits": 16,
2870
+ "data_type": "bfloat"
2871
+ },
2872
+ "model.layers.60.mlp.experts.93.down_proj": {
2873
+ "bits": 16,
2874
+ "data_type": "bfloat"
2875
+ },
2876
+ "model.layers.60.mlp.experts.94.down_proj": {
2877
+ "bits": 16,
2878
+ "data_type": "bfloat"
2879
+ },
2880
+ "model.layers.60.mlp.experts.95.down_proj": {
2881
+ "bits": 16,
2882
+ "data_type": "bfloat"
2883
+ },
2884
+ "model.layers.60.mlp.experts.96.down_proj": {
2885
+ "bits": 16,
2886
+ "data_type": "bfloat"
2887
+ },
2888
+ "model.layers.60.mlp.experts.97.down_proj": {
2889
+ "bits": 16,
2890
+ "data_type": "bfloat"
2891
+ },
2892
+ "model.layers.60.mlp.experts.98.down_proj": {
2893
+ "bits": 16,
2894
+ "data_type": "bfloat"
2895
+ },
2896
+ "model.layers.60.mlp.experts.99.down_proj": {
2897
+ "bits": 16,
2898
+ "data_type": "bfloat"
2899
+ },
2900
+ "model.layers.60.mlp.shared_experts.down_proj": {
2901
+ "bits": 16,
2902
+ "data_type": "bfloat"
2903
+ },
2904
+ "model.layers.60.mlp.shared_experts.gate_proj": {
2905
+ "bits": 4,
2906
+ "group_size": 128
2907
+ },
2908
+ "model.layers.60.mlp.shared_experts.up_proj": {
2909
+ "bits": 4,
2910
+ "group_size": 128
2911
+ },
2912
+ "model.layers.60.self_attn.kv_a_proj_with_mqa": {
2913
+ "bits": 4,
2914
+ "group_size": 128
2915
+ },
2916
+ "model.layers.60.self_attn.kv_b_proj": {
2917
+ "bits": 4,
2918
+ "group_size": 128
2919
+ },
2920
+ "model.layers.60.self_attn.o_proj": {
2921
+ "bits": 4,
2922
+ "group_size": 128
2923
+ },
2924
+ "model.layers.60.self_attn.q_a_proj": {
2925
+ "bits": 4,
2926
+ "group_size": 128
2927
+ },
2928
+ "model.layers.60.self_attn.q_b_proj": {
2929
+ "bits": 4,
2930
+ "group_size": 128
2931
+ },
2932
+ "model.layers.7.mlp.shared_experts.down_proj": {
2933
+ "bits": 4,
2934
+ "group_size": 128
2935
+ },
2936
+ "model.layers.7.mlp.shared_experts.gate_proj": {
2937
+ "bits": 4,
2938
+ "group_size": 128
2939
+ },
2940
+ "model.layers.7.mlp.shared_experts.up_proj": {
2941
+ "bits": 4,
2942
+ "group_size": 128
2943
+ },
2944
+ "model.layers.7.self_attn.kv_a_proj_with_mqa": {
2945
+ "bits": 4,
2946
+ "group_size": 128
2947
+ },
2948
+ "model.layers.7.self_attn.kv_b_proj": {
2949
+ "bits": 4,
2950
+ "group_size": 128
2951
+ },
2952
+ "model.layers.7.self_attn.o_proj": {
2953
+ "bits": 4,
2954
+ "group_size": 128
2955
+ },
2956
+ "model.layers.7.self_attn.q_a_proj": {
2957
+ "bits": 4,
2958
+ "group_size": 128
2959
+ },
2960
+ "model.layers.7.self_attn.q_b_proj": {
2961
+ "bits": 4,
2962
+ "group_size": 128
2963
+ },
2964
+ "model.layers.8.mlp.shared_experts.down_proj": {
2965
+ "bits": 4,
2966
+ "group_size": 128
2967
+ },
2968
+ "model.layers.8.mlp.shared_experts.gate_proj": {
2969
+ "bits": 4,
2970
+ "group_size": 128
2971
+ },
2972
+ "model.layers.8.mlp.shared_experts.up_proj": {
2973
+ "bits": 4,
2974
+ "group_size": 128
2975
+ },
2976
+ "model.layers.8.self_attn.kv_a_proj_with_mqa": {
2977
+ "bits": 4,
2978
+ "group_size": 128
2979
+ },
2980
+ "model.layers.8.self_attn.kv_b_proj": {
2981
+ "bits": 4,
2982
+ "group_size": 128
2983
+ },
2984
+ "model.layers.8.self_attn.o_proj": {
2985
+ "bits": 4,
2986
+ "group_size": 128
2987
+ },
2988
+ "model.layers.8.self_attn.q_a_proj": {
2989
+ "bits": 4,
2990
+ "group_size": 128
2991
+ },
2992
+ "model.layers.8.self_attn.q_b_proj": {
2993
+ "bits": 4,
2994
+ "group_size": 128
2995
+ },
2996
+ "model.layers.9.mlp.shared_experts.down_proj": {
2997
+ "bits": 4,
2998
+ "group_size": 128
2999
+ },
3000
+ "model.layers.9.mlp.shared_experts.gate_proj": {
3001
+ "bits": 4,
3002
+ "group_size": 128
3003
+ },
3004
+ "model.layers.9.mlp.shared_experts.up_proj": {
3005
+ "bits": 4,
3006
+ "group_size": 128
3007
+ },
3008
+ "model.layers.9.self_attn.kv_a_proj_with_mqa": {
3009
+ "bits": 4,
3010
+ "group_size": 128
3011
+ },
3012
+ "model.layers.9.self_attn.kv_b_proj": {
3013
+ "bits": 4,
3014
+ "group_size": 128
3015
+ },
3016
+ "model.layers.9.self_attn.o_proj": {
3017
+ "bits": 4,
3018
+ "group_size": 128
3019
+ },
3020
+ "model.layers.9.self_attn.q_a_proj": {
3021
+ "bits": 4,
3022
+ "group_size": 128
3023
+ },
3024
+ "model.layers.9.self_attn.q_b_proj": {
3025
+ "bits": 4,
3026
+ "group_size": 128
3027
+ }
3028
+ },
3029
+ "gradient_accumulate_steps": 1,
3030
+ "group_size": 64,
3031
+ "iters": 400,
3032
+ "low_gpu_mem_usage": false,
3033
+ "lr": 0.0025,
3034
+ "minmax_lr": 0.0025,
3035
+ "nsamples": 512,
3036
+ "quant_method": "intel/auto-round",
3037
+ "scale_dtype": "torch.float16",
3038
+ "seqlen": 512,
3039
+ "sym": true,
3040
+ "to_quant_block_names": null
3041
+ },
3042
+ "rms_norm_eps": 1e-06,
3043
+ "rope_scaling": {
3044
+ "beta_fast": 32,
3045
+ "beta_slow": 1,
3046
+ "factor": 40,
3047
+ "mscale": 1.0,
3048
+ "mscale_all_dim": 1.0,
3049
+ "original_max_position_embeddings": 4096,
3050
+ "type": "yarn"
3051
+ },
3052
+ "rope_theta": 10000,
3053
+ "routed_scaling_factor": 2.5,
3054
+ "scoring_func": "sigmoid",
3055
+ "seq_aux": true,
3056
+ "tie_word_embeddings": false,
3057
+ "topk_group": 4,
3058
+ "topk_method": "noaux_tc",
3059
+ "torch_dtype": "float16",
3060
+ "transformers_version": "4.47.0",
3061
+ "use_cache": true,
3062
+ "v_head_dim": 128,
3063
+ "vocab_size": 129280
3064
+ }