What is the difference between this and Mistral-Small-Gutenberg-Doppel-22B?

#1
by Animus777 - opened

You described both with the same phrase: "mistralai/Mistral-Small-Instruct-2409 finetuned on jondurbin/gutenberg-dpo-v0.1 and nbeerbower/gutenberg2-dpo."

Halved learning rate, only 1 epoch, and data was formatted for Mistral Instruct instead of ChatML.

data was formatted for Mistral Instruct instead of ChatML.

Ah, that's probably why this one felt better in practice. Was using the wrong template on the other model, heh. ChatML is usually a good fit to help keep User and Characters separate when chatting, but idk if mixing it with 22B's Mistral format helped much, since it might dilute the instruction following. I have a test scenario where the bot will struggle and speak for User a lot when trying Nemo models with the Mistral format, but 22B consistently broke free from it when using the recommended settings on their page. The instruction following on 22B seems to be good enough to where the format doesn't matter too much anymore, tbh.

Your DPO here has been good, and did help with giving scenes more livelihood, even making the characters more willing to use their quirks and unique dialect. But now I'm kinda wondering if 3 epochs with Mistral proper would of been better. Any chance on that or something similar for 22B, especially after tweaking with your token length configuration recently?

Yeah sorry about that; the Mistral template is annoying to work with, but it's fine with this dataset so I should've used it from the get-go.

I'll leave @TheDrummer to decide the next iteration of this particular model. But I think a retrain/v2 of Nemo-Doppel and Small-Doppel might be warranted.

Appreciate your feedback and observations!

I usually stick with 2 epochs but that's with a cosine LR scheduler. I'd rather adjust the LR than train on the same example 3 times.

(And I assume the feedback is that it's a lil undercooked, hmm)

I usually stick with 2 epochs but that's with a cosine LR scheduler. I'd rather adjust the LR than train on the same example 3 times.

(And I assume the feedback is that it's a lil undercooked, hmm)

idk if this is or isn't really, but it's more so just a gut feeling that more epochs might be warranted, at least compared to Nemo.

Nemo Instruct was already pretty good at writing out of the box, but 22B was...kinda dry. Like I said, the ORPO here actually made my cards more willing to do certain things unique to them in cases where the original either struggled, or just flat out did not.

My reasoning says if it already does well with creative stuff, then maybe a lighter touch is better, but if it's the opposite, then crank it up more. Or so goes the thinking. It might even be worth to consider DPO and it's other variants as a better alternative to training traditionally when dealing with the Instruct models. @MarinaraSpaghetti and I when testing models for merging found that anything built on top of Mistral Instruct, instead of Base, had noticeable problems in some regard, mainly to do with context taking a big hit. Yet the ORPO tunes like Bophades and Guttenberg didn't seem to take a hit in that regard.

Sign up or log in to comment