v000000/L3.1-Niitorm-8B-t0.0001 · Is there public repo for this mergekit?

djuna

Sep 9, 2024

I see that nearswap isn't available on main arcee- repo. Is the public fork for it?

v000000

Owner Sep 9, 2024

•

edited Sep 9, 2024

I'm sorry I did not use mergekit for this so there is no repo, i just copied their readme for ease and removed mergekit mentions. I tried to implement it in mergekit, but they have 3000 scripts and custom loaders and what not so I gave up and wrote a merger from scratch.

It's alchemonaut's algorithm which is documented in their page.
I used numpy for this model like they did. (but I recommend using torch tensors since it's way faster but may have differing results, idk?)

Exact algorithm used to create this model:

def lerp(a, b, t):
    return a * (1 - t) + b * t

def nearswap(v0, v1, t):
    lweight = np.abs(v0 - v1)
    with np.errstate(divide='ignore', invalid='ignore'):
        lweight = np.where(lweight != 0, t / lweight, 1.0)
    lweight = np.nan_to_num(lweight, nan=1.0, posinf=1.0, neginf=1.0)
    np.clip(lweight, a_min=0.0, a_max=1.0, out=lweight)
    return lerp(v0, v1, lweight)

def merge_models(model1, model2, t, layer_only=False, no_layers=False):
    with torch.no_grad():
        state_dict1 = model1.state_dict()
        state_dict2 = model2.state_dict()

        if layer_only:
            keys = [key for key in state_dict1.keys() if "layer" in key]
        elif no_layers:
            keys = [key for key in state_dict1.keys() if "layer" not in key]
        else:
            keys = state_dict1.keys()

        for key in keys:
            if state_dict1[key].shape == state_dict2[key].shape:
                state_dict1[key] = torch.tensor(nearswap(state_dict1[key].cpu().numpy(), state_dict2[key].cpu().numpy(), t))
                print(f"Iterated over tensor: {key}")
            else:
                state_dict1[key] = state_dict1[key]
        
        model1.load_state_dict(state_dict1)

# ...Load the models with transformers (bf16 dtype), run merge script (t0.0001 layer_only), save the output model, copy tokenizer, push to hub, gradio webui, read yaml etc

djuna

Sep 10, 2024

Ahh, I see. Thanks tho.

v000000 changed discussion status to closed Sep 10, 2024