Update README.md
Browse files
README.md
CHANGED
@@ -37,23 +37,29 @@ You need to download the .gguf model first
|
|
37 |
If you want to use the cpu install these dependencies:
|
38 |
|
39 |
```python
|
40 |
-
pip install llama-cpp-python
|
41 |
```
|
42 |
|
43 |
If you want to use the gpu instead:
|
44 |
|
45 |
```python
|
46 |
-
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
|
47 |
```
|
48 |
|
49 |
And then use this code to see a response to the prompt.
|
50 |
|
51 |
```python
|
|
|
52 |
from llama_cpp import Llama
|
53 |
|
|
|
|
|
|
|
|
|
|
|
54 |
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
|
55 |
llm = Llama(
|
56 |
-
model_path=
|
57 |
n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources
|
58 |
n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
|
59 |
n_gpu_layers=0 # The number of layers to offload to GPU, if you have GPU acceleration available
|
|
|
37 |
If you want to use the cpu install these dependencies:
|
38 |
|
39 |
```python
|
40 |
+
pip install llama-cpp-python hf_hub_download
|
41 |
```
|
42 |
|
43 |
If you want to use the gpu instead:
|
44 |
|
45 |
```python
|
46 |
+
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install hf_hub_download llama-cpp-python --force-reinstall --upgrade --no-cache-dir
|
47 |
```
|
48 |
|
49 |
And then use this code to see a response to the prompt.
|
50 |
|
51 |
```python
|
52 |
+
from huggingface_hub import hf_hub_download
|
53 |
from llama_cpp import Llama
|
54 |
|
55 |
+
model_path = hf_hub_download(
|
56 |
+
repo_id="MoxoffSpA/AzzurroQuantized",
|
57 |
+
filename="Azzurro-ggml-Q4_K_M.gguf"
|
58 |
+
)
|
59 |
+
|
60 |
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
|
61 |
llm = Llama(
|
62 |
+
model_path=model_path,
|
63 |
n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources
|
64 |
n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
|
65 |
n_gpu_layers=0 # The number of layers to offload to GPU, if you have GPU acceleration available
|