File size: 2,439 Bytes
2712fb5 0d01c4c 2712fb5 d43b072 b707d61 0d01c4c bcf7256 0d01c4c b707d61 0d01c4c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
---
license: apache-2.0
datasets:
- Flmc/DISC-Med-SFT
language:
- zh
pipeline_tag: text-generation
tags:
- baichuan
- medical
- ggml
---
This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.
The weights are converted to GGML format using [baichuan13b.cpp](https://github.com/ouwei2013/baichuan13b.cpp) (based on [llama.cpp](https://github.com/ggerganov/llama.cpp))
|Model |GGML quantize method| HDD size |
|--------------------|--------------------|----------|
|ggml-model-q4_0.bin | q4_0 | 7.55 GB |
|ggml-model-q4_1.bin | q4_1 | 8.36 GB |
|ggml-model-q5_0.bin | q5_0 | 9.17 GB |
|ggml-model-q5_1.bin | q5_1 | 9.97 GB |
|ggml-model-q8_0.bin | q8_0 | 14 GB |
## How to inference
1. [Compile baichuan13b](https://github.com/ouwei2013/baichuan13b.cpp#build), a main executable `baichuan13b/build/bin/main` and a server `baichuan13b/build/bin/server` will be generated.
2. Download the weight in this repository to `baichuan13b/build/bin/`
3. For command line interface, the following command is useful. You can also read [the doc including other command line parameters](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/main#quick-start)
> ```bash
> cd baichuan13b/build/bin/
> ./main -m ggml-model-q4_0.bin --prompt "I feel sick. Nausea and Vomiting."
> ```
4. For API interface, the following command is usefule. You can also read [the doc about server command line options](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/server#llamacppexampleserver)
> ```bash
> cd baichuan13b/build/bin/
> ./server -m ggml-model-q4_0.bin -c 2048
> ```
5. To test API interface, you can use `curl`:
> ```bash
> curl --request POST \
> --url http://localhost:8080/completion \
> --data '{"prompt": "I feel sick. Nausea and Vomiting.", "n_predict": 512}'
> ```
### Use it in Python
To use it in Python script like [cli_demo.py](https://github.com/FudanDISC/DISC-MedLLM/blob/main/cli_demo.py)
all you need to do is replacing the `model.chat()` using `import requests`, POST to `localhost:8080` in JSON
and decode HTTP return.
```python
import requests
llm_output = requests.post(
"http://localhost:8080/completion"
).json({
"prompt": "I feel sick. Nausea and Vomiting.",
"n_predict": 512
}).json()
print(llm_output)
``` |