Fail to run on vLLM

#1
by yaronr - opened

Hi
Trying to run this model on vllm, first issue is it can't find the tokenizer. So I added '--tokenizer=mistralai/Mistral-7B-Instruct-v0.3'.
Then the error is: 'No model.safetensors.index.json found in remote.'.
I assume this is a configuration issue.
Appreciate your assistance.
Below are my vllm params:

--port=8000 
--model=cimphony-ai-admin/Cimphony-Mistral-Law-7B
--tokenizer-mode=mistral
--tokenizer=mistralai/Mistral-7B-Instruct-v0.3
--trust-remote-code
--gpu-memory-utilization=0.9",
Cimphony org

Hi
Trying to run this model on vllm, first issue is it can't find the tokenizer. So I added '--tokenizer=mistralai/Mistral-7B-Instruct-v0.3'.
Then the error is: 'No model.safetensors.index.json found in remote.'.
I assume this is a configuration issue.
Appreciate your assistance.
Below are my vllm params:

--port=8000 
--model=cimphony-ai-admin/Cimphony-Mistral-Law-7B
--tokenizer-mode=mistral
--tokenizer=mistralai/Mistral-7B-Instruct-v0.3
--trust-remote-code
--gpu-memory-utilization=0.9",

This repo contains only the LoRA adapter, where the base model is mistral-v0.1. You can use it directly with PEFT library, or with vLLM using these instructions: https://docs.vllm.ai/en/latest/models/lora.html

Got it.
I've followed the instructions, and now I'm getting another error:

OSError: Found 0 files matching the pattern: {matched_files}. Make sure that a Mistral tokenizer is present in {tokenizer_name}.

Running with params:

model='mistralai/Mistral-7B-v0.1', speculative_config=None, tokenizer='mistralai/Mistral-7B-v0.1', skip_tokenizer_init=False, tokenizer_mode=mistral, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=mistralai/Mistral-7B-v0.1, use_v2_block_manager=False, num_scheduler_steps=1, multi_step_stream_outputs=False, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=True, mm_processor_kwargs=None)

BTW I get this error whether I specify a tokenizer or not.
I've tried specifying both the mistral v01 tokenizer, or cimphony-ai-admin/Cimphony-Mistral-Law-7B.
I'm assuming this is a configuration issue?

@iarbel Could you please take a look at the error above?
I would love to run some benchmarks on this model.
Here's how I run vLLM now:

                --model=mistralai/Mistral-7B-v0.1
                --tokenizer-mode=mistral
                --tokenizer=cimphony-ai-admin/Cimphony-Mistral-Law-7B
                --enable-lora
                --lora-modules='{"name": "cimphony-mistral-law-7b", "path": "cimphony-ai-admin/Cimphony-Mistral-Law-7B", "base_model_name": "mistralai/Mistral-7B-v0.1"}'
                --trust-remote-code
Cimphony org

@iarbel Could you please take a look at the error above?
I would love to run some benchmarks on this model.
Here's how I run vLLM now:

                --model=mistralai/Mistral-7B-v0.1
                --tokenizer-mode=mistral
                --tokenizer=cimphony-ai-admin/Cimphony-Mistral-Law-7B
                --enable-lora
                --lora-modules='{"name": "cimphony-mistral-law-7b", "path": "cimphony-ai-admin/Cimphony-Mistral-Law-7B", "base_model_name": "mistralai/Mistral-7B-v0.1"}'
                --trust-remote-code

I'm not sure what the problem is here. Are you able to load Mistral-v0.1 with some other, arbitrary LORA adapter?
Also, you can try loading it directly through the Transformers / PEFT library

Sign up or log in to comment