File size: 1,945 Bytes
2712fb5
 
 
d43b072
 
 
 
b707d61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: apache-2.0
---

This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.

The weights are converted to GGML format using [baichuan13b.cpp](https://github.com/ouwei2013/baichuan13b.cpp) (based on [llama.cpp](https://github.com/ggerganov/llama.cpp))

## How to inference
1. [Compile baichuan13b](https://github.com/ouwei2013/baichuan13b.cpp#build), a main executable `baichuan13b/build/bin/main` and a server `baichuan13b/build/bin/server` will be generated.
2. Download the weight in this repository to `baichuan13b/build/bin/`
3. For command line interface, the following command is useful. You can also read [the doc including other command line parameters](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/main#quick-start)
    > ```bash
    > cd baichuan13b/build/bin/
    > ./main -m ggml-model-q4_0.bin --prompt "I feel sick. Nausea and Vomiting."
    > ```

4. For API interface, the following command is usefule. You can also read [the doc about server command line options](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/server#llamacppexampleserver)
    > ```bash
    > cd baichuan13b/build/bin/
    > ./server -m ggml-model-q4_0.bin -c 2048
    > ```

5. To test API interface, you can use `curl`:
    > ```bash
    > curl --request POST \
    > --url http://localhost:8080/completion \
    > --data '{"prompt": "I feel sick. Nausea and Vomiting.", "n_predict": 512}'
    > ```

### Use it in Python
To use it in Python script like [cli_demo.py](https://github.com/FudanDISC/DISC-MedLLM/blob/main/cli_demo.py)
all you need to do is replacing the `model.chat()` using `import requests`, POST to `localhost:8080` in JSON
and decode HTTP return.
```python
import requests
llm_output = requests.post(
  "http://localhost:8080/completion"
).json({
  "prompt": "I feel sick. Nausea and Vomiting.",
  "n_predict": 512
}).json()
print(llm_output)
```