bartowski/Behemoth-123B-v2.2-GGUF · How to merge the file parts ?

Nov 27, 2024

•

edited Nov 27, 2024

I joined the Q_4_M files into one like this:

    with open(output_path, "wb") as outfile:
        for part_file in part_files:
            part_path = os.path.join(directory, part_file)
            with open(part_path, "rb") as infile:
                outfile.write(infile.read())

The resulting file is 73,219,623,072 bytes and its SHA256 is ae69ba5e00f9f53731941f1aa2e5d0101ca41581c59bf7d8b76714ae62c7d3b6
But when I load it with the last version of Koboldcpp_cu12.exe (version 1.78) it crashes immediatly and says

llama_model_load: error loading model: invalid split file: C:\models\Behemoth-12T-'¿ÄTllama_load_model_from_file: failed to load model
Traceback (most recent call last):
  File "koboldcpp.py", line 4720, in <module>
    main(parser.parse_args(),start_server=True)
  File "koboldcpp.py", line 4344, in main
    loadok = load_model(modelname)
  File "koboldcpp.py", line 900, in load_model
    ret = handle.load_model(inputs)
OSError: exception: access violation reading 0x00000000000018A4
[18352] Failed to execute script 'koboldcpp' due to unhandled exception!

This exception doesn't happen with other mistral 123B Q_4_M quantization, such as mistral itself, magnum, luminum, lumimaid...

bartowski

Owner Nov 27, 2024

yeah don't do it that way, you need to use the llama-split method, or ideally just don't merge them at all. If you just attempt to load the first part, koboldcpp should then automatically load the rest

AliceThirty

Nov 27, 2024

Thank you it worked