merge
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the DARE TIES merge method using CultriX/SeQwence-14Bv1 as a base.
Models Merged
The following models were included in the merge:
- VAGOsolutions/SauerkrautLM-v2-14b-DPO
- qingy2019/Qwen2.5-Math-14B-Instruct
- CultriX/Qwen2.5-14B-Wernickev3
- CultriX/Qwen2.5-14B-Emergedv3
- CultriX/Qwen2.5-14B-Unity
- allknowingroger/QwenSlerp6-14B
Configuration
The following YAML configuration was used to produce this model:
models:
- model: CultriX/SeQwence-14Bv1
parameters:
weight: 0.22 # Boosted slightly to improve general task performance
density: 0.62 # Prioritize generalist adaptability
- model: allknowingroger/QwenSlerp6-14B
parameters:
weight: 0.18
density: 0.59 # Slight increase to enhance contextual reasoning (tinyHellaswag)
- model: CultriX/Qwen2.5-14B-Wernickev3
parameters:
weight: 0.16
density: 0.56 # Minor increase to stabilize GPQA and MUSR performance
- model: CultriX/Qwen2.5-14B-Emergedv3
parameters:
weight: 0.15 # Increase weight for domain-specific expertise
density: 0.55
- model: VAGOsolutions/SauerkrautLM-v2-14b-DPO
parameters:
weight: 0.12
density: 0.56 # Enhance factual reasoning and IFEval contributions
- model: CultriX/Qwen2.5-14B-Unity
parameters:
weight: 0.10
density: 0.53
- model: qingy2019/Qwen2.5-Math-14B-Instruct
parameters:
weight: 0.10
density: 0.51 # Retain focus on MATH and advanced reasoning tasks
merge_method: dare_ties
base_model: CultriX/SeQwence-14Bv1
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: Qwen/Qwen2.5-14B-Instruct
adaptive_merge_parameters:
task_weights:
IFEval: 1.5 # Strengthened for better instruction-following
BBH: 1.3
MATH: 1.6 # Emphasize advanced reasoning and problem-solving
GPQA: 1.4 # Improve factual recall and logical QA tasks
MUSR: 1.5 # Strengthened multi-step reasoning capabilities
MMLU-PRO: 1.3 # Slight boost for domain-specific multitask knowledge
smoothing_factor: 0.19 # Refined for smoother blending of task strengths
gradient_clipping: 0.88 # Tightened slightly for precise parameter contribution
- Downloads last month
- 35
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for CultriX/Qwen2.5-14B-FinalMerge
Merge model
this model