
sometimesanotion PRO
AI & ML interests
Recent Activity
Organizations
sometimesanotion's activity
Fusion vs. SLERP?


You need to keep testing models in pytorch, not just GGUF, to catch this bug. If you submit it for evaluation on the open leaderboard, it will abort.
For those who need a bit of Python to test their merged models:
import os
from typing import List
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
def main(checkpoint: str) -> None:
"""Load and return tokenizers and models for specified checkpoints."""
tokenizers = [AutoTokenizer.from_pretrained(checkpoint)]
print(f"Loaded tokenizer from {checkpoint}")
models = [
AutoModelForCausalLM.from_pretrained(
checkpoint, device_map="auto", torch_dtype=torch.bfloat16
).to("cuda" if torch.cuda.is_available() else "cpu")
]
for model in models:
print(f"Loaded model to {model.device}")
def cli():
"""CLI entry point."""
import argparse
parser = argparse.ArgumentParser(description='Load a tokenizer and model from a given checkpoint.')
parser.add_argument('checkpoint', type=str, help='The pre-trained checkpoint name or path')
args = parser.parse_args()
main(args.checkpoint)
if __name__ == "__main__":
cli()

If you use slices in della_linear merges that have multiple models - as you'd expect of a merge! - an attempt to load the output model in torch will get you:
ValueError: Trying to set a tensor of shape torch.Size([1, 5120]) in "weight" (which has shape torch.Size([5120])), this looks incorrect.
This strategy was key to Lamarck v0.6 and v0.7's success. Their merge recipes haven't been working with newer mergekits.
These work:
models:
- model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
- model: sthenno-com/miscii-14b-0218
slices:
- sources:
- { layer_range: [ 0, 2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
- sources:
- { layer_range: [ 2, 6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
This does not:
slices:
- sources:
- { layer_range: [ 0, 2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
- { layer_range: [ 0, 2 ], model: sthenno-com/miscii-14b-0218 }
- sources:
- { layer_range: [ 2, 6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
- { layer_range: [ 2, 6 ], model: sthenno-com/miscii-14b-0218 }
@Crystalcareai , do you know of any work on this? Will @arcee-ai need a detailed report? These della_linear recipes used to work. Overall, thank you for all the cool work, I hope to get this fixed!

TimeLordRaps/DS-R1-Lamarckvergence-14B-1M-test3

microsoft/Phi-4-mini-instruct
YOYO-AI/Qwen2.5-14B-YOYO-V4-p2

Lunzima/NQLSG-Qwen2.5-14B-OriginalFusion

Lunzima/NQLSG-Qwen2.5-14B-MegaFusion-v8.7

The numbers are in! The results are fascinating.
Though IFEVAL skewed low compared to the ancestor model's average, and Lamarckvergence's improved MATH didn't come through, this model is strong in several ways. The GPQA score suggests as much. These are scores I'm pretty sure I can improve without giving up much of the interesting gains.
What's more, my subjective impression is that its prose and consistency get a boost from Chocolatine. @jpacifico , I think arcee_fusion is a merge method that has a lot to offer for your future base models! This also bodes very well for the next several merges to come.
I think what you're doing here is really helpful


I've been doing all my LoRA work on AMD hardware with Linux; I'm looking forward to your notes! I sometimes still do it on CPU because it's easy to renice the task priority so the foreground tasks stay snappy.
The main challenge I have is keeping a solid ROCm bitsandbytes install when other packages want updates.