Using -ctk q4_0 -ctv q4_0 with llama.cpp server throws flash_attn error
#1
by
softwareweaver
- opened
Running llama.cpp server with parameters -ctk q4_0 -ctv q4_0
llama_init_from_model: flash_attn requires n_embd_head_k == n_embd_head_v - forcing off
llama_init_from_model: V cache quantization requires flash_attn
common_init_from_params: failed to create context with model '/home/ash/ai/llms/DeepSeek-R1-Q4_K_M-00001-of-00011.gguf'
srv load_model: failed to load model, '/home/ash/ai/llms/DeepSeek-R1-Q4_K_M-00001-of-00011.gguf'
main: exiting due to model loading error
It works if I remove -ctk q4_0 -ctv q4_0 but then the context memory requirements are a lot higher. Is this a llama.cpp issue?
Thanks,
Ash