rozek commited on
Commit
e88d5a9
·
1 Parent(s): ed9c6c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -5
README.md CHANGED
@@ -41,11 +41,43 @@ container and store the quantization results
41
  2. download the weights for the fine-tuned LLaMA-2 model from
42
  [Hugging Face](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K) into a subfolder of `llama.cpp_in_Docker`
43
  (let's call the new folder `LLaMA-2-7B-32K`)
44
- 3. within the Docker Desktop, download search for and download a `basic-python` image - just use one of the
45
- most popular ones
46
- 4. from a terminal session on your host computer (i.e., not a Docker container!), start a new container for the
47
- downloaded image which mounts the folder we crated before:<br>&nbsp;<br>``
48
- ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ## License ##
51
 
 
41
  2. download the weights for the fine-tuned LLaMA-2 model from
42
  [Hugging Face](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K) into a subfolder of `llama.cpp_in_Docker`
43
  (let's call the new folder `LLaMA-2-7B-32K`)
44
+ 3. within the <u>Docker Desktop</u>, download search for and download a `basic-python` image - just use one of
45
+ the most popular ones
46
+ 4. from a <u>terminal session on your host computer</u> (i.e., not a Docker container!), start a new container
47
+ for the downloaded image which mounts the folder we crated before:<br>&nbsp;<br>`docker run --rm \
48
+ -v ./llama.cpp_in_Docker:/llama.cpp \
49
+ -t basic-python /bin/bash`<br>&nbsp;<br>(you may have to adjust the path to your local folder)
50
+ 5. back in the <u>Docker Desktop</u>, open the "Terminal" tab of the started container and enter the
51
+ following commands:<br>&nbsp;<br>```
52
+ apt update
53
+ apt-get install software-properties-common -y
54
+ apt-get update
55
+ apt-get install g++ git make -y
56
+ cd /llama.cpp
57
+ git clone https://github.com/ggerganov/llama.cpp
58
+ cd llama.cpp
59
+ ```
60
+ 6. now open the "Files" tab and navigate to the file `/llama.cpp/llama.cpp/Makefile`, right-click on it and
61
+ choose "Edit file"
62
+ 7. search for `aarch64`, and - in the line found (which looks like `ifneq ($(filter aarch64%,$(UNAME_M)),)`) -
63
+ change `ifneq` to `ifeq`
64
+ 8. save your change using the disk icon in the upper right corner of the editor pane and open the "Terminal"
65
+ tab again
66
+ 9. now enter the following commands:<br>&nbsp;<br>```
67
+ make
68
+ python3 -m pip install -r requirements.txt
69
+ python3 convert.py ../LLaMA-2-7B-32K
70
+ ```
71
+ 10. you are now ready to run the actual quantization, e.g., using<br>&nbsp;<br>```
72
+ ./quantize ../LLaMA-2-7B-32K/ggml-model-f16.gguf \
73
+ ../LLaMA-2-7B-32K/LLaMA-2-7B-32K-Q4_0.gguf Q4_0
74
+ ```
75
+ 11. run any quantizations you need and stop the container again (you may even delete it as the generated files
76
+ will remain available on your host computer
77
+
78
+ You are now free to move the quanitization results to where you need them and run inferences with context
79
+ lengths up to 32K (depending on the amount of memory you will have available - long contexts need an awful
80
+ lot of RAM)
81
 
82
  ## License ##
83