marcodambra commited on
Commit
e11ca4e
·
verified ·
1 Parent(s): 9e61e55

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -37,23 +37,29 @@ You need to download the .gguf model first
37
  If you want to use the cpu install these dependencies:
38
 
39
  ```python
40
- pip install llama-cpp-python
41
  ```
42
 
43
  If you want to use the gpu instead:
44
 
45
  ```python
46
- CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
47
  ```
48
 
49
  And then use this code to see a response to the prompt.
50
 
51
  ```python
 
52
  from llama_cpp import Llama
53
 
 
 
 
 
 
54
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
55
  llm = Llama(
56
- model_path="path/to/model.gguf", # Download the model file first
57
  n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources
58
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
59
  n_gpu_layers=0 # The number of layers to offload to GPU, if you have GPU acceleration available
 
37
  If you want to use the cpu install these dependencies:
38
 
39
  ```python
40
+ pip install llama-cpp-python hf_hub_download
41
  ```
42
 
43
  If you want to use the gpu instead:
44
 
45
  ```python
46
+ CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install hf_hub_download llama-cpp-python --force-reinstall --upgrade --no-cache-dir
47
  ```
48
 
49
  And then use this code to see a response to the prompt.
50
 
51
  ```python
52
+ from huggingface_hub import hf_hub_download
53
  from llama_cpp import Llama
54
 
55
+ model_path = hf_hub_download(
56
+ repo_id="MoxoffSpA/AzzurroQuantized",
57
+ filename="Azzurro-ggml-Q4_K_M.gguf"
58
+ )
59
+
60
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
61
  llm = Llama(
62
+ model_path=model_path,
63
  n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources
64
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
65
  n_gpu_layers=0 # The number of layers to offload to GPU, if you have GPU acceleration available