qwen-chat-14B-ggml

This repo contains GGML format model files for qwen-chat-14B.

Example code

Install packages

pip install xinference[ggml]>=0.4.3
pip install qwen-cpp

If you want to run with GPU acceleration, refer to installation.

Start a local instance of Xinference

xinference -p 9997

Launch and inference

from xinference.client import Client

client = Client("http://localhost:9997")
model_uid = client.launch_model(
    model_name="qwen-chat",
    model_format="ggmlv3", 
    model_size_in_billions=14,
    quantization="q4_0",
    )
model = client.get_model(model_uid)

chat_history = []
prompt = "最大的动物是什么?"
model.chat(
    prompt,
    chat_history,
    generate_config={"max_tokens": 1024}
)

More information

Xinference Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you are empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

👉 Join our Slack community!

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.