Question a bout model tokenizer

by Bachhoang - opened Apr 24

Apr 24

When using the GGUF model and checking the vocabulary metadata, I noticed a slight difference compared to the base model's vocabulary:

The GGUF model does not have a padding token, as shown below:

However, the base model includes a padding token in its vocabulary:

Could someone explain why this difference exists and how I can handle like it?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment