iandennismiller
/

LLama-2-MedText-13b-GGUF

iandennismiller commited on Sep 16, 2023

Commit

bace60a

1 Parent(s): 9478eab

include notes about quantization process

Files changed (1) hide show

Readme.md CHANGED Viewed

@@ -53,6 +53,42 @@ Framework versions
 ## Setup Notes
 ```bash
 mkvirtualenv -p `which python3.11` -a . ${PWD##*/}
 python -m pip install huggingface_hub

 ## Setup Notes
+### Download torch model
+This example demonstrates using `hfdownloader` to download a torch model from HF to `./storage`
+```bash
+./hfdownloader -m truehealth/LLama-2-MedText-13b
+```
+If necessary, install `hfdownloader` from https://github.com/bodaay/HuggingFaceModelDownloader
+```bash
+bash <(curl -sSL https://raw.githubusercontent.com/bodaay/HuggingFaceModelDownloader/master/scripts/gist_gethfd.sh) -h
+```
+### Quantize torch model with llama.cpp
+Quantize directly to q8_0
+```bash
+llama.cpp/convert.py --outtype q8_0 --outfile LLama-2-MedText-13b-q8_0.gguf ./models/Storage/truehealth_LLama-2-MedText-13b/pytorch_model-00001-of-00003.bin
+```
+First convert to f32 GGUF
+```bash
+llama.cpp/convert.py --outtype f32 --outfile LLama-2-MedText-13b-f32.gguf ./models/Storage/truehealth_LLama-2-MedText-13b/pytorch_model-00001-of-00003.bin
+```
+Then quantize f32 GGUF to lower bit resolutions
+```bash
+llama.cpp/build/bin/quantize LLama-2-MedText-13b-f32.gguf LLama-2-MedText-13b-Q3_K_L.gguf Q3_K_L
+```
+### Distributing model through huggingface
 ```bash
 mkvirtualenv -p `which python3.11` -a . ${PWD##*/}
 python -m pip install huggingface_hub