Why change the configuration of the tokenizer?

#4
by Lingrui - opened

Why change the configuration of the tokenizer instead of continuing to use Qwen2.5's chat template?

From what I have observed, the Distill model tokenizer has replaced the token IDs that were already trained in the Qwen2.5-Instruct model. I believe these token IDs might have been assigned certain meanings by the model. However, the structure of the Distill chat template could potentially alter the meanings of these token IDs. Could this lead to a decline in performance or make it more difficult to inject new capabilities?
image.png

DeepSeek org

These tokens from Qwen are reserved for multimodal models. We replace them for the reasoning model.

These tokens from Qwen are reserved for multimodal models. We replace them for the reasoning model.

May I ask why you use '<|' and '|>' instead of '<|' and '|>'? Not a very common pick.

Sign up or log in to comment