Segmentation fault on kobold cpp 1.55.1 using phi-2-uncensored.q4_k_m.gguf

#1
by deleted - opened

hi i try to use model phi-2-uncensored.q4_k_m.gguf in kobold cpp 1.55.1 and i get Segmentation fault error
here log llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from /media/352-423/aimod/llm/hi-2-uncensored.q4_k_m.gguf (version GGUF V3 (latest))
llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = phi2
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 51200
llm_load_print_meta: n_merges = 50000
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 2560
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 32
llm_load_print_meta: n_embd_head_k = 80
llm_load_print_meta: n_embd_head_v = 80
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 2560
llm_load_print_meta: n_embd_v_gqa = 2560
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 10240
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 3B
llm_load_print_meta: model ftype = unknown, may not work
llm_load_print_meta: model params = 2.78 B
llm_load_print_meta: model size = 1.66 GiB (5.14 BPW)
llm_load_print_meta: general.name = Phi2
llm_load_print_meta: BOS token = 50256 '<|endoftext|>'
llm_load_print_meta: EOS token = 50256 '<|endoftext|>'
llm_load_print_meta: UNK token = 50256 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_tensors: ggml ctx size = 0.12 MiB
llm_load_tensors: using OpenCL for GPU acceleration
llm_load_tensors: system memory used = 753.97 MiB
llm_load_tensors: VRAM used = 950.78 MiB
llm_load_tensors: offloading 20 repeating layers to GPU
llm_load_tensors: offloaded 20/33 layers to GPU
....................................................................................
Automatic RoPE Scaling: Using (scale:1.000, base:10000.0).
llama_new_context_with_model: n_ctx = 2128
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size = 665.00 MiB, K (f16): 332.50 MiB, V (f16): 332.50 MiB
llama_build_graph: non-view tensors processed: 774/774
llama_new_context_with_model: compute buffer total size = 337.51 MiB
Segmentation fault

Sign up or log in to comment