just curious
#2
by
010O11
- opened
"The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself." >> are you sure? how so? my intuition telling me the opposite, sorry for that...
Well you are finetuning 8x1b(6.5b approx) against finetuning 1b.
In the llm space bigger is almost always better. If not then why is 7b model not as good as 70b?
This comment has been hidden
Hey so ive been messing around with the mixtral branch of mergekit and im just curious how you got your config to work? I am trying to replicate with the base model for education and it throws a tremendous amount of errors. Did you edit the mixtral branch further to fit your particular use case?
It worked out of the box for me. No changes. But it only works with llama and mistral architectures.
srinivasbilla
changed discussion status to
closed