npc0
/

DISC-MedLLM-ggml

Text Generation

Model card Files Files and versions Community

DISC-MedLLM-ggml / README.md

npc0's picture

Update README.md

bcf7256 12 months ago

|

history blame contribute delete

2.44 kB

	---
	license: apache-2.0
	datasets:
	- Flmc/DISC-Med-SFT
	language:
	- zh
	pipeline_tag: text-generation
	tags:
	- baichuan
	- medical
	- ggml
	---

	This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.

	The weights are converted to GGML format using [baichuan13b.cpp](https://github.com/ouwei2013/baichuan13b.cpp) (based on [llama.cpp](https://github.com/ggerganov/llama.cpp))

	\|Model \|GGML quantize method\| HDD size \|
	\|--------------------\|--------------------\|----------\|
	\|ggml-model-q4_0.bin \| q4_0 \| 7.55 GB \|
	\|ggml-model-q4_1.bin \| q4_1 \| 8.36 GB \|
	\|ggml-model-q5_0.bin \| q5_0 \| 9.17 GB \|
	\|ggml-model-q5_1.bin \| q5_1 \| 9.97 GB \|
	\|ggml-model-q8_0.bin \| q8_0 \| 14 GB \|

	## How to inference
	1. [Compile baichuan13b](https://github.com/ouwei2013/baichuan13b.cpp#build), a main executable `baichuan13b/build/bin/main` and a server `baichuan13b/build/bin/server` will be generated.
	2. Download the weight in this repository to `baichuan13b/build/bin/`
	3. For command line interface, the following command is useful. You can also read [the doc including other command line parameters](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/main#quick-start)
	> ```bash
	> cd baichuan13b/build/bin/
	> ./main -m ggml-model-q4_0.bin --prompt "I feel sick. Nausea and Vomiting."
	> ```

	4. For API interface, the following command is usefule. You can also read [the doc about server command line options](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/server#llamacppexampleserver)
	> ```bash
	> cd baichuan13b/build/bin/
	> ./server -m ggml-model-q4_0.bin -c 2048
	> ```

	5. To test API interface, you can use `curl`:
	> ```bash
	> curl --request POST \
	> --url http://localhost:8080/completion \
	> --data '{"prompt": "I feel sick. Nausea and Vomiting.", "n_predict": 512}'
	> ```

	### Use it in Python
	To use it in Python script like [cli_demo.py](https://github.com/FudanDISC/DISC-MedLLM/blob/main/cli_demo.py)
	all you need to do is replacing the `model.chat()` using `import requests`, POST to `localhost:8080` in JSON
	and decode HTTP return.
	```python
	import requests
	llm_output = requests.post(
	"http://localhost:8080/completion"
	).json({
	"prompt": "I feel sick. Nausea and Vomiting.",
	"n_predict": 512
	}).json()
	print(llm_output)
	```