DISC-MedLLM-ggml / README.md
npc0's picture
Update README.md
b707d61
|
raw
history blame
1.95 kB
metadata
license: apache-2.0

This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.

The weights are converted to GGML format using baichuan13b.cpp (based on llama.cpp)

How to inference

  1. Compile baichuan13b, a main executable baichuan13b/build/bin/main and a server baichuan13b/build/bin/server will be generated.

  2. Download the weight in this repository to baichuan13b/build/bin/

  3. For command line interface, the following command is useful. You can also read the doc including other command line parameters

    cd baichuan13b/build/bin/
    ./main -m ggml-model-q4_0.bin --prompt "I feel sick. Nausea and Vomiting."
    
  4. For API interface, the following command is usefule. You can also read the doc about server command line options

    cd baichuan13b/build/bin/
    ./server -m ggml-model-q4_0.bin -c 2048
    
  5. To test API interface, you can use curl:

    curl --request POST \
    --url http://localhost:8080/completion \
    --data '{"prompt": "I feel sick. Nausea and Vomiting.", "n_predict": 512}'
    

Use it in Python

To use it in Python script like cli_demo.py all you need to do is replacing the model.chat() using import requests, POST to localhost:8080 in JSON and decode HTTP return.

import requests
llm_output = requests.post(
  "http://localhost:8080/completion"
).json({
  "prompt": "I feel sick. Nausea and Vomiting.",
  "n_predict": 512
}).json()
print(llm_output)