--- base_model: - AlSamCur123/Mistral-Small3-24B-InstructContinuedFine - trashpanda-org/MS-24B-Instruct-Mullein-v0 - huihui-ai/Mistral-Small-24B-Instruct-2501-abliterated library_name: transformers tags: - mergekit - merge --- # merge This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). ## Merge Details ### Merge Method This model was merged using the [DELLA](https://arxiv.org/abs/2406.11617) merge method using [huihui-ai/Mistral-Small-24B-Instruct-2501-abliterated](https://huggingface.co/huihui-ai/Mistral-Small-24B-Instruct-2501-abliterated) as a base. ### Models Merged The following models were included in the merge: * [AlSamCur123/Mistral-Small3-24B-InstructContinuedFine](https://huggingface.co/AlSamCur123/Mistral-Small3-24B-InstructContinuedFine) * [trashpanda-org/MS-24B-Instruct-Mullein-v0](https://huggingface.co/trashpanda-org/MS-24B-Instruct-Mullein-v0) ### Configuration The following YAML configuration was used to produce this model: ```yaml # 文件名: pcb_della_merge.yaml merge_method: della # 基于DELLA的自适应剪裁 base_model: huihui-ai/Mistral-Small-24B-Instruct-2501-abliterated models: - model: trashpanda-org/MS-24B-Instruct-Mullein-v0 parameters: weight: 1.0 # 添加默认权重 # PCB策略:限制层影响范围 + 动态竞争平衡 layers: - layers: "8-16" parameter_name: density value: 0.4 - layers: "8-16" parameter_name: epsilon value: 0.15 - layers: "8-16" parameter_name: lambda value: 1.5 - layers: "17-24" parameter_name: density value: 0.2 variance_threshold: 0.3 - model: AlSamCur123/Mistral-Small3-24B-InstructContinuedFine parameters: weight: 1.0 # 添加默认权重 # 强化指令理解层 layers: - layers: "0-12" parameter_name: density value: 0.7 - layers: "0-12" parameter_name: epsilon value: 0.05 - layers: "0-12" parameter_name: lambda value: 2.0 variance_threshold: 0.25 - model: huihui-ai/Mistral-Small-24B-Instruct-2501-abliterated parameters: weight: 1.0 # 添加默认权重 # 基模型参数保护策略 density: 0.9 # 全局密度 (可能需要单独处理) layers: - layers: "12-24" parameter_name: density value: 1.0 parameters: global_density: 0.55 # 全局剪裁密度(PCB平衡点) intra_balance: true variance_threshold: 0.2 epsilon_range: [0.1, 0.2] tokenizer: source: base # 参数压缩设置(实现12-13B目标) architecture: hidden_size: 4096 intermediate_size: 11008 num_attention_heads: 32 num_hidden_layers: 30 ```