AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation
Abstract
Large Language Models (LLMs) have shown impressive versatility as general purpose models. However, their broad applicability comes at a high-cost computational overhead, particularly in auto-regressive decoding where each step requires a forward pass. In domain-specific settings, general-purpose capabilities are unnecessary and can be exchanged for efficiency. In this work, we take a novel perspective on domain adaptation, reducing latency and computational costs by adapting the vocabulary to focused domains of interest. We introduce AdaptiVocab, an end-to-end approach for vocabulary adaptation, designed to enhance LLM efficiency in low-resource domains. AdaptiVocab can be applied to any tokenizer and architecture, modifying the vocabulary by replacing tokens with domain-specific n-gram-based tokens, thereby reducing the number of tokens required for both input processing and output generation. AdaptiVocab initializes new n-token embeddings using an exponentially weighted combination of existing embeddings and employs a lightweight fine-tuning phase that can be efficiently performed on a single GPU. We evaluate two 7B LLMs across three niche domains, assessing efficiency, generation quality, and end-task performance. Our results show that AdaptiVocab reduces token usage by over 25% without compromising performance
Community
Paper website: https://itay-nakash.github.io/AdaptiVocab/
Twitte: https://x.com/itay__nakash/status/1905193130028142595
AdaptiVocab is a method to make LLMs faster and cheaper in niche domains by adapting their vocabulary. It replaces general tokens with domain-specific n-grams, cuts token usage by 25%+, and keeps performance intact—with minimal fine-tuning on a single GPU.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LoRACode: LoRA Adapters for Code Embeddings (2025)
- PAFT: Prompt-Agnostic Fine-Tuning (2025)
- UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings (2025)
- Understanding and Improving Information Preservation in Prompt Compression for LLMs (2025)
- Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models (2025)
- Efficient OpAmp Adaptation for Zoom Attention to Golden Contexts (2025)
- Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper