Is there public repo for this mergekit?
#1
by
djuna
- opened
I see that nearswap isn't available on main arcee- repo. Is the public fork for it?
I'm sorry I did not use mergekit for this so there is no repo, i just copied their readme for ease and removed mergekit mentions. I tried to implement it in mergekit, but they have 3000 scripts and custom loaders and what not so I gave up and wrote a merger from scratch.
It's alchemonaut's algorithm which is documented in their page.
I used numpy for this model like they did. (but I recommend using torch tensors since it's way faster but may have differing results, idk?)
Exact algorithm used to create this model:
def lerp(a, b, t):
return a * (1 - t) + b * t
def nearswap(v0, v1, t):
lweight = np.abs(v0 - v1)
with np.errstate(divide='ignore', invalid='ignore'):
lweight = np.where(lweight != 0, t / lweight, 1.0)
lweight = np.nan_to_num(lweight, nan=1.0, posinf=1.0, neginf=1.0)
np.clip(lweight, a_min=0.0, a_max=1.0, out=lweight)
return lerp(v0, v1, lweight)
def merge_models(model1, model2, t, layer_only=False, no_layers=False):
with torch.no_grad():
state_dict1 = model1.state_dict()
state_dict2 = model2.state_dict()
if layer_only:
keys = [key for key in state_dict1.keys() if "layer" in key]
elif no_layers:
keys = [key for key in state_dict1.keys() if "layer" not in key]
else:
keys = state_dict1.keys()
for key in keys:
if state_dict1[key].shape == state_dict2[key].shape:
state_dict1[key] = torch.tensor(nearswap(state_dict1[key].cpu().numpy(), state_dict2[key].cpu().numpy(), t))
print(f"Iterated over tensor: {key}")
else:
state_dict1[key] = state_dict1[key]
model1.load_state_dict(state_dict1)
# ...Load the models with transformers (bf16 dtype), run merge script (t0.0001 layer_only), save the output model, copy tokenizer, push to hub, gradio webui, read yaml etc
Ahh, I see. Thanks tho.
v000000
changed discussion status to
closed