merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the della_linear merge method using CultriX/Qwen2.5-14B-Wernickev3 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: della_linear
base_model: CultriX/Qwen2.5-14B-Wernickev3
dtype: bfloat16
parameters:
  epsilon: 0.012        # Further reduced to ensure ultra-fine parameter scaling for precision.
  lambda: 1.4           # Stronger emphasis on significant model contributions.
  normalize: true       # Balances the parameter integration for stability.

adaptive_merge_parameters:
  task_weights:
    tinyArc: 1.6           # Prioritizes logical reasoning improvements.
    tinyHellaswag: 1.5     # Strengthened contextual understanding and consistency.
    tinyMMLU: 1.65         # Enhanced domain knowledge for multitask benchmarks.
    tinyTruthfulQA: 1.9    # Maximized for accurate factual reasoning and QA.
    tinyTruthfulQA_mc1: 1.7 # Balanced focus for multiple-choice reasoning.
    tinyWinogrande: 1.75   # Advanced reasoning and contextual prediction improvement.
    IFEval: 1.9            # Instruction-following tasks boosted by multitask contributors.
    BBH: 1.7               # Complex reasoning is supported by logical base models.
    MATH: 2.1              # Highest priority, focusing on mathematical excellence.
    GPQA: 1.8              # Boosted graduate-level QA capabilities.
    MUSR: 1.9              # Nuanced multi-step reasoning strengthened further.
    MMLU-PRO: 1.8          # Domain multitask performance maximized.
  smoothing_factor: 0.1    # Precisely tuned for smooth task-specific blending.

gradient_clipping:
  CultriX/Qwen2.5-14B-Wernickev3: 0.86  # Backbone stability with slightly reduced clipping.
  CultriX/Qwenfinity-2.5-14B: 0.83      # Consistent multitask integration.
  djuna/Q2.5-Veltha-14B-0.5: 0.91       # Strengthened advanced reasoning contributions.
  CultriX/Qwen2.5-14B-Broca: 0.85       # Logical reasoning enhancements stabilized.
  qingy2019/Qwen2.5-Math-14B-Instruct: 0.93 # Mathematically focused tasks maximized.
  CultriX/SeQwence-14Bv1: 0.88          # Generalist multitask support.
  sometimesanotion/Qwen2.5-14B-Vimarckoso: 0.89 # Balanced multi-step reasoning contributions.
  allknowingroger/QwenSlerp6-14B: 0.87  # Contextual and logical reasoning integration refined.

models:
  - model: CultriX/Qwen2.5-14B-Wernickev3
    parameters:
      weight: 0.26       # Core backbone for multitask reasoning.
      density: 0.7       # Slight increase to preserve critical reasoning parameters.
  - model: CultriX/Qwenfinity-2.5-14B
    parameters:
      weight: 0.23       # Comprehensive multitask performer.
      density: 0.65
  - model: djuna/Q2.5-Veltha-14B-0.5
    parameters:
      weight: 0.22       # Advanced reasoning support for GPQA and MUSR.
      density: 0.72
  - model: CultriX/Qwen2.5-14B-Broca
    parameters:
      weight: 0.15       # Logical reasoning and factual QA enhancements.
      density: 0.65
  - model: qingy2019/Qwen2.5-Math-14B-Instruct
    parameters:
      weight: 0.18       # Mathematical reasoning priority.
      density: 0.73
  - model: CultriX/SeQwence-14Bv1
    parameters:
      weight: 0.14       # Generalist multitask backbone.
      density: 0.63
  - model: sometimesanotion/Qwen2.5-14B-Vimarckoso
    parameters:
      weight: 0.12       # Multi-step reasoning tasks contributor.
      density: 0.6
  - model: allknowingroger/QwenSlerp6-14B
    parameters:
      weight: 0.1        # Contextual reasoning improvements.
      density: 0.62

tokenizer_source: CultriX/Qwen2.5-14B-Wernickev3
Downloads last month
12
Safetensors
Model size
14.8B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CultriX/Qwen2.5-14B-Brocav3

Space using CultriX/Qwen2.5-14B-Brocav3 1