Using -ctk q4_0 -ctv q4_0 with llama.cpp server throws flash_attn error

#1
by softwareweaver - opened

Running llama.cpp server with parameters -ctk q4_0 -ctv q4_0

llama_init_from_model: flash_attn requires n_embd_head_k == n_embd_head_v - forcing off
llama_init_from_model: V cache quantization requires flash_attn
common_init_from_params: failed to create context with model '/home/ash/ai/llms/DeepSeek-R1-Q4_K_M-00001-of-00011.gguf'
srv load_model: failed to load model, '/home/ash/ai/llms/DeepSeek-R1-Q4_K_M-00001-of-00011.gguf'
main: exiting due to model loading error

It works if I remove -ctk q4_0 -ctv q4_0 but then the context memory requirements are a lot higher. Is this a llama.cpp issue?

Thanks,
Ash

Sign up or log in to comment