Missing config.json

#1
by denis-kazakov - opened

I tried to use the model in a pipeline as shown in the model card (both from HF and a pre-downloaded local copy) but get this error message: LLama-3.1-KazLLM-1.0-8B-GGUF4 does not appear to have a file named config.json. Checkout 'https://huggingface.co//media/denis/D/Models/LLM/KazLLM/LLama-3.1-KazLLM-1.0-8B-GGUF4/tree/None' for available files.

Institute of Smart Systems and Artificial Intelligence, Nazarbayev University org
edited 16 days ago

Hello.

You can run like that using vllm. Not sure what the problem with pipeline.

1 cell.

# Setup env: 
!conda create -n vllm_test python=3.10 -y
!pip install vllm==0.6.3
!pip install ipykernel
!python -m ipykernel install --user --name vllm_test

2 cell

# load model
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
from vllm import LLM, SamplingParams

# In this script, we demonstrate how to pass input to the chat method:
conversation = [
   {
      "role": "system",
      "content": "You are a helpful assistant"
   },
   {
      "role": "user",
      "content": "Hello"
   },
   {
      "role": "assistant",
      "content": "Hello! How can I assist you today?"
   },
   {
      "role": "user",
      "content": "Write an essay about the importance of higher education.",
   },
]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="/data/nvme5n1p1/vladimir_workspace/models/quantized/gguf/checkpoints_llama8b_031224_18900-gguf/checkpoints_llama8b_031224_18900-Q4_K_M.gguf",
         gpu_memory_utilization=0.95)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.chat(conversation, sampling_params)

3 cell

# Print the outputs.
for output in outputs:
   prompt = output.prompt
   generated_text = output.outputs[0].text
   print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
   
for output in outputs:
   prompt = output.prompt
   generated_text = output.outputs[0].text
   print(f"Prompt: {prompt}, Generated text: {generated_text}")

Or you can also run using llama.cpp if you want, because vllm not yet fully optimized for gguf.

Sign up or log in to comment