Text Generation
Transformers
Safetensors
gpt2
text-generation-inference
Inference Endpoints
4-bit precision
gptq

Any plans for gguf format?

#3
by Xouthos - opened
AI Sweden Model Hub org

I see that the only quantized format available is gptq. Any chance we will get gguf format for us who are not using Nvidia hardware?

AI Sweden Model Hub org

Assuming you have the weights for AI-Sweden-Models/gpt-sw3-20b-instruct in a folder with the name gpt-sw3-20b-instruct and you want a high-quality 5-bit model:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
python -m venv venv
. venv/bin/activate
python -m pip install -r requirements/requirements-convert-hf-to-gguf.txt
python convert-hf-to-gguf.py ../gpt-sw3-20b-instruct --outfile gpt-sw3-20b-instruct-f16.gguf
./quantize gpt-sw3-20b-instruct-f16.gguf gpt-sw3-20b-instruct-q5_k_m.gguf q5_k_m

There you go :-)
Peter

AI Sweden Model Hub org

Thank you! Will try that out!

AI Sweden Model Hub org
edited Jan 31, 2024

I tried it with gpt-sw3-6.7b-v2-instruct before I try it with the larger model, but I get this error:

python3 convert-hf-to-gguf.py models/gpt-sw3-6.7b-v2-instruct --outfile models/models--AI-Sweden-Models--gpt-sw3-6.7b-v2-instruct.gguf
Loading model: gpt-sw3-6.7b-v2-instruct
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Traceback (most recent call last):
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1246, in
main()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1233, in main
model_instance.set_vocab()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 52, in set_vocab
self._set_vocab_gpt2()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 247, in _set_vocab_gpt2
vocab_size = hparams.get("vocab_size", len(tokenizer.vocab))
^^^^^^^^^^^^^^^
AttributeError: 'GPTSw3Tokenizer' object has no attribute 'vocab'

AI Sweden Model Hub org

@Xouthos What if you goto line 247 in /Users/[email protected]/scripts/llama.cpp/convert-hf-to-gguf.py and hardcode vocab_size = 64000?

AI Sweden Model Hub org
edited Jan 31, 2024

@timpal0l It did not help, still getting the error:

python3 convert-hf-to-gguf.py models/gpt-sw3-6.7b-v2-instruct --outfile models/models--AI-Sweden-Models--gpt-sw3-6.7b-v2-instruct.gguf
Loading model: gpt-sw3-6.7b-v2-instruct
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Traceback (most recent call last):
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1246, in
main()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1233, in main
model_instance.set_vocab()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 52, in set_vocab
self._set_vocab_gpt2()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 248, in _set_vocab_gpt2
assert max(tokenizer.vocab.values()) < vocab_size
^^^^^^^^^^^^^^^
AttributeError: 'GPTSw3Tokenizer' object has no attribute 'vocab'

AI Sweden Model Hub org

@Xouthos
Could you replace:

vocab_size = hparams.get("vocab_size", len(tokenizer.vocab))

with

vocab_size = len(tokenizer.get_vocab())

and

assert max(tokenizer.vocab.values()) < vocab_size

with

assert max(tokenizer.get_vocab().values()) < vocab_size
AI Sweden Model Hub org

I just tried it myself. The issues go further than vocab vs get_vocab(). Once all that is fixed, it does not do what to do with the self-attention bias.

It might be necessary for someone with intimate knowledge of the gpt-sw3 architecture to amend one of the llama.cpp convert scripts (or create a custom one).

AI Sweden Model Hub org

Tried that as well @timpal0l now, getting:

Loading model: gpt-sw3-6.7b-v2-instruct
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Traceback (most recent call last):
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1246, in
main()
File "/Users/[email protected]/scripts/llama.cpp/convert-hf-to-gguf.py", line 1233, in main
model_instance.set_vocab()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 52, in set_vocab
self.set_vocab_gpt2()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 250, in set_vocab_gpt2
reverse_vocab = {id
: encoded_tok for encoded_tok, id
in tokenizer.vocab.items()}
^^^^^^^^^^^^^^^
AttributeError: 'GPTSw3Tokenizer' object has no attribute 'vocab'

Sign up or log in to comment