just curious

by 010O11 - opened Jan 4, 2024

Jan 4, 2024

"The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself." >> are you sure? how so? my intuition telling me the opposite, sorry for that...

srinivasbilla

Owner Jan 4, 2024

Well you are finetuning 8x1b(6.5b approx) against finetuning 1b.

In the llm space bigger is almost always better. If not then why is 7b model not as good as 70b?

010O11

Jan 5, 2024

This comment has been hidden

macadeliccc

Jan 5, 2024

Hey so ive been messing around with the mixtral branch of mergekit and im just curious how you got your config to work? I am trying to replicate with the base model for education and it throws a tremendous amount of errors. Did you edit the mixtral branch further to fit your particular use case?

srinivasbilla

Owner Jan 5, 2024

It worked out of the box for me. No changes. But it only works with llama and mistral architectures.

srinivasbilla changed discussion status to closed Jan 5, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment