iandennismiller commited on
Commit
bace60a
·
1 Parent(s): 9478eab

include notes about quantization process

Browse files
Files changed (1) hide show
  1. Readme.md +36 -0
Readme.md CHANGED
@@ -53,6 +53,42 @@ Framework versions
53
 
54
  ## Setup Notes
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ```bash
57
  mkvirtualenv -p `which python3.11` -a . ${PWD##*/}
58
  python -m pip install huggingface_hub
 
53
 
54
  ## Setup Notes
55
 
56
+ ### Download torch model
57
+
58
+ This example demonstrates using `hfdownloader` to download a torch model from HF to `./storage`
59
+
60
+ ```bash
61
+ ./hfdownloader -m truehealth/LLama-2-MedText-13b
62
+ ```
63
+
64
+ If necessary, install `hfdownloader` from https://github.com/bodaay/HuggingFaceModelDownloader
65
+
66
+ ```bash
67
+ bash <(curl -sSL https://raw.githubusercontent.com/bodaay/HuggingFaceModelDownloader/master/scripts/gist_gethfd.sh) -h
68
+ ```
69
+
70
+ ### Quantize torch model with llama.cpp
71
+
72
+ Quantize directly to q8_0
73
+
74
+ ```bash
75
+ llama.cpp/convert.py --outtype q8_0 --outfile LLama-2-MedText-13b-q8_0.gguf ./models/Storage/truehealth_LLama-2-MedText-13b/pytorch_model-00001-of-00003.bin
76
+ ```
77
+
78
+ First convert to f32 GGUF
79
+
80
+ ```bash
81
+ llama.cpp/convert.py --outtype f32 --outfile LLama-2-MedText-13b-f32.gguf ./models/Storage/truehealth_LLama-2-MedText-13b/pytorch_model-00001-of-00003.bin
82
+ ```
83
+
84
+ Then quantize f32 GGUF to lower bit resolutions
85
+
86
+ ```bash
87
+ llama.cpp/build/bin/quantize LLama-2-MedText-13b-f32.gguf LLama-2-MedText-13b-Q3_K_L.gguf Q3_K_L
88
+ ```
89
+
90
+ ### Distributing model through huggingface
91
+
92
  ```bash
93
  mkvirtualenv -p `which python3.11` -a . ${PWD##*/}
94
  python -m pip install huggingface_hub