README.md · npc0/DISC-MedLLM-ggml at 0ba641ced5dcbcb260235c914bfd9d22c74e66b5

metadata

license: apache-2.0

This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.

The weights are converted to GGML format using baichuan13b.cpp (based on llama.cpp)

How to inference

Compile baichuan13b, a main executable baichuan13b/build/bin/main and a server baichuan13b/build/bin/server will be generated.
Download the weight in this repository to baichuan13b/build/bin/
For command line interface, the following command is useful. You can also read the doc including other command line parameters
```
cd baichuan13b/build/bin/
./main -m ggml-model-q4_0.bin --prompt "I feel sick. Nausea and Vomiting."
```
For API interface, the following command is usefule. You can also read the doc about server command line options
```
cd baichuan13b/build/bin/
./server -m ggml-model-q4_0.bin -c 2048
```

To test API interface, you can use curl:

curl --request POST \
--url http://localhost:8080/completion \
--data '{"prompt": "I feel sick. Nausea and Vomiting.", "n_predict": 512}'

Use it in Python

To use it in Python script like cli_demo.py all you need to do is replacing the model.chat() using import requests, POST to localhost:8080 in JSON and decode HTTP return.

import requests
llm_output = requests.post(
  "http://localhost:8080/completion"
).json({
  "prompt": "I feel sick. Nausea and Vomiting.",
  "n_predict": 512
}).json()
print(llm_output)