Any plans for gguf format?
I see that the only quantized format available is gptq. Any chance we will get gguf format for us who are not using Nvidia hardware?
Assuming you have the weights for AI-Sweden-Models/gpt-sw3-20b-instruct in a folder with the name gpt-sw3-20b-instruct and you want a high-quality 5-bit model:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
python -m venv venv
. venv/bin/activate
python -m pip install -r requirements/requirements-convert-hf-to-gguf.txt
python convert-hf-to-gguf.py ../gpt-sw3-20b-instruct --outfile gpt-sw3-20b-instruct-f16.gguf
./quantize gpt-sw3-20b-instruct-f16.gguf gpt-sw3-20b-instruct-q5_k_m.gguf q5_k_m
There you go :-)
Peter
Thank you! Will try that out!
I tried it with gpt-sw3-6.7b-v2-instruct before I try it with the larger model, but I get this error:
python3 convert-hf-to-gguf.py models/gpt-sw3-6.7b-v2-instruct --outfile models/models--AI-Sweden-Models--gpt-sw3-6.7b-v2-instruct.gguf
Loading model: gpt-sw3-6.7b-v2-instruct
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Traceback (most recent call last):
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1246, in
main()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1233, in main
model_instance.set_vocab()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 52, in set_vocab
self._set_vocab_gpt2()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 247, in _set_vocab_gpt2
vocab_size = hparams.get("vocab_size", len(tokenizer.vocab))
^^^^^^^^^^^^^^^
AttributeError: 'GPTSw3Tokenizer' object has no attribute 'vocab'
@Xouthos
What if you goto line 247 in /Users/[email protected]/scripts/llama.cpp/convert-hf-to-gguf.py
and hardcode vocab_size = 64000
?
@timpal0l It did not help, still getting the error:
python3 convert-hf-to-gguf.py models/gpt-sw3-6.7b-v2-instruct --outfile models/models--AI-Sweden-Models--gpt-sw3-6.7b-v2-instruct.gguf
Loading model: gpt-sw3-6.7b-v2-instruct
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Traceback (most recent call last):
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1246, in
main()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1233, in main
model_instance.set_vocab()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 52, in set_vocab
self._set_vocab_gpt2()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 248, in _set_vocab_gpt2
assert max(tokenizer.vocab.values()) < vocab_size
^^^^^^^^^^^^^^^
AttributeError: 'GPTSw3Tokenizer' object has no attribute 'vocab'
I just tried it myself. The issues go further than vocab vs get_vocab(). Once all that is fixed, it does not do what to do with the self-attention bias.
It might be necessary for someone with intimate knowledge of the gpt-sw3 architecture to amend one of the llama.cpp convert scripts (or create a custom one).
Tried that as well @timpal0l now, getting:
Loading model: gpt-sw3-6.7b-v2-instruct
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Traceback (most recent call last):
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1246, in
main()
File "/Users/[email protected]/scripts/llama.cpp/convert-hf-to-gguf.py", line 1233, in main
model_instance.set_vocab()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 52, in set_vocab
self.set_vocab_gpt2()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 250, in set_vocab_gpt2
reverse_vocab = {id: encoded_tok for encoded_tok, id in tokenizer.vocab.items()}
^^^^^^^^^^^^^^^
AttributeError: 'GPTSw3Tokenizer' object has no attribute 'vocab'