https://huggingface.co/FreedomIntelligence/Apollo2-9B

#370
by eleius - opened

Sure sure :) Will probably be done in the next 30 minutes or so. You can watch it at http://hf.tst.eu/status.html

mradermacher changed discussion status to closed

Unfortunately:

ValueError: All items in a GGUF array should be of the same type

That's usually a problem with the model config (say, a string where a number is exepcted), but could also be a bug in llama.cpp. When I find time later I will investigate.

Ah too bad. Thanks anyway!

I added some debug output to see what is going on:

INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:Apollo2-9B.SOURCE.gguf: n_tensors = 464, total_size = 18.5G
general.architecture = gemma2
general.type = model
general.name = Apollo2 9B
general.basename = Apollo2
general.size_label = 9B
general.license = apache-2.0
general.base_model.count = 1
general.base_model.0.name = Gemma 2 9b
general.base_model.0.organization = Google
general.base_model.0.repo_url = https://huggingface.co/google/gemma-2-9b
general.tags = ['biology', 'medical', 'question-answering']
general.languages = ['ar', 'en', 'zh', 'ko', 'ja', 'mn', 'th', 'vi', 'lo', 'mg', 'de', 'pt', 'es', 'fr', 'ru', 'it', 'hr', 'gl', 'cs', 'co', 'la', 'uk', 'bs', 'bg', 'eo', 'sq', 'da', 'sa', False, 'gn', 'sr', 'sk', 'gd', 'lb', 'hi', 'ku', 'mt', 'he', 'ln', 'bm', 'sw', 'ig', 'rw', 'ha']
Traceback (most recent call last):
  File "/root/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module>
    main()
  File "/root/llama.cpp/convert_hf_to_gguf.py", line 4424, in main
    model_instance.write()
  File "/root/llama.cpp/convert_hf_to_gguf.py", line 436, in write
    self.gguf_writer.write_kv_data_to_file()
  File "/root/llama.cpp/gguf-py/gguf/gguf_writer.py", line 241, in write_kv_data_to_file
    kv_bytes += self._pack_val(val.value, val.type, add_vtype=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/llama.cpp/gguf-py/gguf/gguf_writer.py", line 885, in _pack_val
    raise ValueError("All items in a GGUF array should be of the same type")
ValueError: All items in a GGUF array should be of the same type

Take a close look at this array:

general.languages = ['ar', 'en', 'zh', 'ko', 'ja', 'mn', 'th', 'vi', 'lo', 'mg', 'de', 'pt', 'es', 'fr', 'ru', 'it', 'hr', 'gl', 'cs', 'co', 'la', 'uk', 'bs', 'bg', 'eo', 'sq', 'da', 'sa', False, 'gn', 'sr', 'sk', 'gd', 'lb', 'hi', 'ku', 'mt', 'he', 'ln', 'bm', 'sw', 'ig', 'rw', 'ha']
general.languages = ['ar', ..., 'sa', False, 'gn', ..., 'ha']

All languages but are of type string but one is of type boolean.

I just looked at their README.md on HuggingFace and saw this:

(...)
  - da
  - sa
  - 'no'
  - gn
  - sr
(...)

The only reason the model doesn't convert is a faulty README.md. If you delete the README.md it converts.

The issue is that if I look at the actual README.md file I can't see no beeing in quotes. This is so strange.

Nice if I only had removed the quotation charactes around 'no' which HuggingFace automaticaly added in thair visualized version I would have realized that "no" is a reserved word in YAML. And the fact that it is not enclosed in the raw README.md means that it gets interpreted as False according to the YAML specifications: https://yaml.org/type/bool.html

(...)
  - da
  - sa
  - no
  - gn
  - sr
(...)

I just released a llama.cpp fixed version of this model under https://huggingface.co/nicoboss/Apollo2-9B-llamacppfixed

I created a pull request on the original model as well in the hope they might fix this issue: https://huggingface.co/FreedomIntelligence/Apollo2-9B/discussions/1

yaml just sucks. anyway, i modernised my build script and, probably at the same time, also added some debugging code, because this error is not uncommon, and there is no helpful diagnostic at all when it happens (and it happens for many reasons). what it means is that I will likely maintain my own fork from now on, something i tried to avoid really hard :/

gguf serialising key general.languages value GGUFValue(value=['ar', 'en', 'zh', 'ko', 'ja', 'mn', 'th', 'vi', 'lo', 'mg', 'de', 'pt', 'es', 'fr', 'ru', 'it', 'hr', 'gl', 'cs', 'co', 'la', 'uk', 'bs', 'bg', 'eo', 'sq', 'da', 'sa', False, 'gn', 'sr', 'sk', 'gd', 'lb', 'hi', 'ku', 'mt', 'he', 'ln', 'bm', 'sw', 'ig', 'rw', 'ha'], type=<GGUFValueType.ARRAY: 9>)

ah, and fyi, it's not a reserved word in yaml - the interpretation is up to the specific yaml parser, every yaml parser can have it's own interpretation. the reference you cited is 15 years out of date (it was superseded in 2009, and never was more than a draft). that's why yaml doubly sucks.

huggingface probably does not document their yaml dialect, so it's up to the yaml library they use. when they switch, they might switch interpretation as well. yaml is not a reliable data format. i deeply regret having helped it on the way decades ago :/

the reference you cited is 15 years out of date

on further inspection, it was always up to the parser in every released version of the spec (checked down to 1.1, because I have a vested interest in yaml :).

anyway, it's working now.

Sign up or log in to comment