Any methodology description?

#1
by teknium - opened

Interested in the process used to do this, if you're interested in sharing.

Also, it says GPT4xAlpaca is included, but we never made a 30b model. Did you mean something else by chance, or is it possible to merge a 13b model into 30b?

They are merging datasets, not models.
Any dataset can be trained against any base model. your GPT4xAlpaca dataset can train a 65b model as well as a 7b model.
They assembled these datasets and then trained a 33b model with it.

Caldera AI org

@teknium I suspect he used chansung/gpt4-alpaca-lora-30b it would not be possible to merge models of various sizes.
@ehartford This is merging lora's into models, and merging models directly, this is not a merging of datasets or training.

Part of the tools can be found here : https://github.com/ontocord/MDEL/tree/main/Model%20Merge%20And%20Analysis%20Tools

@Henk717 Why does the method (merging models directly) work? It is quite rare until recent months. Is there any paper/blog to describe this technology?

Caldera AI org

We're working on the paper, at the moment mostly research and experimentation by the team.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment