Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
onekq 
posted an update about 12 hours ago
Post
491
Heard good things about this model and no inference providers support it ...

THUDM/GLM-4-9B-0414

it works on the llama.cpp

It is how you can run it:

llama-server -ngl 999 --host 192.168.1.68 --override-kv glm4.rope.dimension_count=int:64 --override-kv tokenizer.ggml.eos_token_id=int:151336 -m /mnt/nvme0n1/LLM/quantized/GLM-4-9B-0414-Q8_0.gguf

Read here why:

Eval bug: GLM-Z1-9B-0414 · Issue #12946 · ggml-org/llama.cpp:
https://github.com/ggml-org/llama.cpp/issues/12946#issuecomment-2803564782

·

Ah I see. they have their own architecture.

https://github.com/huggingface/transformers/pull/37388

This will be hard.

In this post