npc0 commited on
Commit
b707d61
1 Parent(s): 5209740

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md CHANGED
@@ -5,3 +5,40 @@ license: apache-2.0
5
  This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.
6
 
7
  The weights are converted to GGML format using [baichuan13b.cpp](https://github.com/ouwei2013/baichuan13b.cpp) (based on [llama.cpp](https://github.com/ggerganov/llama.cpp))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.
6
 
7
  The weights are converted to GGML format using [baichuan13b.cpp](https://github.com/ouwei2013/baichuan13b.cpp) (based on [llama.cpp](https://github.com/ggerganov/llama.cpp))
8
+
9
+ ## How to inference
10
+ 1. [Compile baichuan13b](https://github.com/ouwei2013/baichuan13b.cpp#build), a main executable `baichuan13b/build/bin/main` and a server `baichuan13b/build/bin/server` will be generated.
11
+ 2. Download the weight in this repository to `baichuan13b/build/bin/`
12
+ 3. For command line interface, the following command is useful. You can also read [the doc including other command line parameters](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/main#quick-start)
13
+ > ```bash
14
+ > cd baichuan13b/build/bin/
15
+ > ./main -m ggml-model-q4_0.bin --prompt "I feel sick. Nausea and Vomiting."
16
+ > ```
17
+
18
+ 4. For API interface, the following command is usefule. You can also read [the doc about server command line options](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/server#llamacppexampleserver)
19
+ > ```bash
20
+ > cd baichuan13b/build/bin/
21
+ > ./server -m ggml-model-q4_0.bin -c 2048
22
+ > ```
23
+
24
+ 5. To test API interface, you can use `curl`:
25
+ > ```bash
26
+ > curl --request POST \
27
+ > --url http://localhost:8080/completion \
28
+ > --data '{"prompt": "I feel sick. Nausea and Vomiting.", "n_predict": 512}'
29
+ > ```
30
+
31
+ ### Use it in Python
32
+ To use it in Python script like [cli_demo.py](https://github.com/FudanDISC/DISC-MedLLM/blob/main/cli_demo.py)
33
+ all you need to do is replacing the `model.chat()` using `import requests`, POST to `localhost:8080` in JSON
34
+ and decode HTTP return.
35
+ ```python
36
+ import requests
37
+ llm_output = requests.post(
38
+ "http://localhost:8080/completion"
39
+ ).json({
40
+ "prompt": "I feel sick. Nausea and Vomiting.",
41
+ "n_predict": 512
42
+ }).json()
43
+ print(llm_output)
44
+ ```