All the Llama 3.1/3.2 models apparently have a slightly different vocabulary and llama.cpp rejects them for doing speculative decoding. Do you know of any small models (on the order of 3b-8b preferably) that have an identical tokenizer configuration?