Compatible small models for speculative decoding?

by treehugg3 - opened 15 days ago

15 days ago

All the Llama 3.1/3.2 models apparently have a slightly different vocabulary and llama.cpp rejects them for doing speculative decoding. Do you know of any small models (on the order of 3b-8b preferably) that have an identical tokenizer configuration?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment