Can't Use it with VLLM, although gemma-2B from Google is supported
#8
by
yaswanth-iitkgp
- opened
VLLM supports gemma-2b from google, but when I try to use this version with vllm I get the following error when I try to use it VLLM.
Traceback (most recent call last):
File "/workspace/offline_inference.py", line 17, in <module>
llm = LLM(model="mustafaaljadery/gemma-2B-10M", gpu_memory_utilization=0.6)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 112, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 196, in from_engine_args
engine = cls(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 110, in __init__
self.model_executor = executor_class(model_config, cache_config,
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 37, in __init__
self._init_worker()
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 66, in _init_worker
self.driver_worker.load_model()
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 107, in load_model
self.model_runner.load_model()
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model
self.model = get_model(
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 101, in get_model
model.load_weights(model_config.model, model_config.download_dir,
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/gemma.py", line 390, in load_weights
param = params_dict[name]
KeyError: 'model.layers.0.self_attn.gate'
I am trying to convert to GGUF using llama.cpp/convert.py, but am stuck with a similar issue with an unknown tensor name.
Traceback (most recent call last):
File "/Users/thekumar/git/localmodels/llama.cpp/convert.py", line 1714, in <module>
main()
File "/Users/thekumar/git/localmodels/llama.cpp/convert.py", line 1700, in main
model = convert_model_names(model, params, args.skip_unknown)
File "/Users/thekumar/git/localmodels/llama.cpp/convert.py", line 1402, in convert_model_names
raise ValueError(f"Unexpected tensor name: {name}. Use --skip-unknown to ignore it (e.g. LLaVA)")
ValueError: Unexpected tensor name: model.layers.0.self_attn.gate. Use --skip-unknown to ignore it (e.g. LLaVA)