How did you convert it?
#1
by
ZeroWw
- opened
I tried to convert grom the original model to get the f16 model.
But I have an error:
python llama.cpp/convert-hf-to-gguf.py --outtype f16 /content/gemma-1.1-7b-it --outfile /content/gemma-1.1-7b-it.f16.gguf
INFO:hf-to-gguf:Loading model: gemma-1.1-7b-it
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
....
File "/content/llama.cpp/gguf-py/gguf/gguf_writer.py", line 166, in add_key_value
raise ValueError(f'Duplicated key name {key!r}')
ValueError: Duplicated key name 'tokenizer.chat_template'
I commented the line in gguf writer for now.. we'll see.
Commenting those instruction causes that the second duplicate will overwrite the previous one.
def add_key_value(self, key: str, val: Any, vtype: GGUFValueType) -> None:
#if key in self.kv_data:
# raise ValueError(f'Duplicated key name {key!r}')
yeah someone broke the conversion for gemma recently and it needs to be fixed
Any idea how to run gemma in llama.cpp ?
I tried with the above models, the model answers in llama.cpp UI (server) but aftter the answer it continues by itself.
Need to specify proper stop tokens I would guess