license: other
Llama-3 chat vector
- Update 0426: A small problem with the deployment of the model 'Llama-3-Seagull-Evo-8B', but we hope to have it back in good time!
- Update 0526: Check our newest EMM model, Alpha-Ko-8B-Instruct
This is 'modelified' version of chat vector from the paper Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages. So this is not a model, its just weight diff, just for ease to use myself(or you too)!
What I understand here: 'Chat vector method' is a merging method that utilizes the difference between the base model, the continuously pre-trained (usually language transferred) model, and the chat model; so the recipe is
model(base) + weight_diff(continous pretrained) + weight_diff(instruct)
or
model(base) + weight_diff(continous pretrained + fine-tuned) + weight_diff(instruct)
.
So before (my) initial purpose in comparing which method is better, llama3 → CP + chat vector → FT
vs. llama3 → CP → FT + chat vector
, it seems reasonable to compare it with other methods in Mergekit.
Model | Method | Kobest(f1) | Haerae(acc) |
---|---|---|---|
beomi/Llama-3-Open-Ko-8B-Instruct-preview | chat vector | 0.4368 | 0.439 |
kuotient/Llama-3-Ko-8B-ties | Ties | 0.4821 | 0.5160 |
kuotient/Llama-3-Ko-8B-dare-ties | Dare-ties | 0.4950 | 0.5399 |
kuotient/Llama-3-Ko-8B-TA | Task Arithmetic(maybe...? not sure about this) | - | |
WIP | Model stock(I don't read this paper yet but still) | - | |
kuotient/Llama-3-Seagull-Evo-8B | Evolutionary Model Merging | 0.6139 | 0.5344 |
--- | --- | --- | --- |
meta-llama/Meta-Llama-3-8B | Base | - | - |
meta-llama/Meta-Llama-3-8B-Instruct | - | 0.4239 | 0.4931 |
beomi/Llama-3-Open-Ko-8B | Korean Base | 0.4374 | 0.3813 |
All that aside, I'd like to thank @beomi for creating such an awesome korean-based model.