K024
/

ChatGLM-6b-onnx-u8s8

Model card Files Files and versions Community

ChatGLM-6b-onnx-u8s8 / README.md

K024's picture

Update README.md

2397160 over 1 year ago

|

history blame contribute delete

1.65 kB

	---
	language:
	- zh
	- en
	tags:
	- chatglm
	- glm
	- onnx
	- onnxruntime
	---

	# ChatGLM-6B + ONNX

	This model is exported from [ChatGLM-6b](https://huggingface.co/THUDM/chatglm-6b) with int8 quantization and optimized for [ONNXRuntime](https://onnxruntime.ai/) inference. Export code in [this repo](https://github.com/K024/chatglm-q).

	Inference code with ONNXRuntime is uploaded with the model. Install requirements and run `streamlit run web-ui.py` to start chatting. Currently the `MatMulInteger` (for u8s8 data type) and `DynamicQuantizeLinear` operators are only supported on CPU. Arm64 with Neon support (Apple M1/M2) should be reasonably fast.

	安装依赖并运行 `streamlit run web-ui.py` 预览模型效果。由于 ONNXRuntime 算子支持问题，目前仅能够使用 CPU 进行推理，在 Arm64 (Apple M1/M2) 上有可观的速度。具体的 ONNX 导出代码在[这个仓库](https://github.com/K024/chatglm-q)中。

	## Usage

	Clone with [git-lfs](https://git-lfs.com/):

	```sh
	git lfs clone https://huggingface.co/K024/ChatGLM-6b-onnx-u8s8
	cd ChatGLM-6b-onnx-u8s8
	pip install -r requirements.txt
	streamlit run web-ui.py
	```

	Or use `huggingface_hub` [python client lib](https://huggingface.co/docs/huggingface_hub/guides/download#download-files-to-local-folder) to download the repo snapshot:

	```python
	from huggingface_hub import snapshot_download
	snapshot_download(repo_id="K024/ChatGLM-6b-onnx-u8s8", local_dir="./ChatGLM-6b-onnx-u8s8")
	```

	Codes are released under MIT license.

	Model weights are released under the same license as ChatGLM-6b, see [MODEL LICENSE](https://huggingface.co/THUDM/chatglm-6b/blob/main/MODEL_LICENSE).