Model Information

This is a vocab pruned variant of Llama-3.2-1B-Instruct. The vocabulary size is pruned from 128256 to 32256. The total parameter size is: 1,039,214,756, ~200M parameters pruned from origin.

How to use

This is a code example:

import torch
from transformers import pipeline


pipe = pipeline(
    "text-generation",
    model='k-l-lambda/Llama-3.2-1B-vocab32k',
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Another:

from transformers import AutoModelForCausalLM, AutoTokenizer


tokenizer = AutoTokenizer.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k")
model = AutoModelForCausalLM.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k")

input_ids = tokenizer.encode("Hello, ", return_tensors="pt")
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))

Tokens conversion

You can map an ID value in 32k vocab to the ID value in original 128k vocab, by the tensor in token_indices.pt and inv_token_indices.pt.

import torch
from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer


tokenizer128k = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B-Instruct')
tokenizer32k = AutoTokenizer.from_pretrained('k-l-lambda/Llama-3.2-1B-vocab32k')

indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='token_indices.pt')
inv_indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='inv_token_indices.pt')
token_indices = torch.load(indices_path)
inv_token_indices = torch.load(inv_indices_path)

ids_32k = tokenizer32k.encode('This is an example sentence.')
ids_128k = [token_indices[id].item() for id in ids_32k]
print(f'{ids_32k=}')
print(f'{ids_128k=}')

print(tokenizer128k.decode(ids_128k))


ids_128k = tokenizer128k.encode('This is another example sentence.')
ids_32k = [inv_token_indices[id].item() for id in ids_128k]
print(f'{ids_128k=}')
print(f'{ids_32k=}')	# non-exist tokens in 32k vocab will map to -1

print(tokenizer32k.decode(ids_32k))
Downloads last month
156
Safetensors
Model size
1.04B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for k-l-lambda/Llama-3.2-1B-vocab32k

Quantizations
1 model