deepseek-ai/DeepSeek-R1-Distill-Qwen-14B · Why change the configuration of the tokenizer?

2 days ago

•

Why change the configuration of the tokenizer instead of continuing to use Qwen2.5's chat template?

From what I have observed, the Distill model tokenizer has replaced the token IDs that were already trained in the Qwen2.5-Instruct model. I believe these token IDs might have been assigned certain meanings by the model. However, the structure of the Distill chat template could potentially alter the meanings of these token IDs. Could this lead to a decline in performance or make it more difficult to inject new capabilities?

GeeekExplorer

DeepSeek org 1 day ago

These tokens from Qwen are reserved for multimodal models. We replace them for the reasoning model.

CHNtentes

about 21 hours ago

These tokens from Qwen are reserved for multimodal models. We replace them for the reasoning model.

May I ask why you use '<｜' and '｜>' instead of '<|' and '|>'? Not a very common pick.