MoxoffSrL
/

AzzurroQuantized

Text Generation

Inference Endpoints

Model card Files Files and versions Community

marcodambra commited on Apr 9, 2024

Commit

e11ca4e

·

verified ·

1 Parent(s): 9e61e55

Update README.md

Files changed (1) hide show

README.md +9 -3

README.md CHANGED Viewed

@@ -37,23 +37,29 @@ You need to download the .gguf model first
 If you want to use the cpu install these dependencies:
 ```python
-pip install llama-cpp-python
 ```
 If you want to use the gpu instead:
 ```python
-CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
 ```
 And then use this code to see a response to the prompt.
 ```python
 from llama_cpp import Llama
 # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
 llm = Llama(
-  model_path="path/to/model.gguf",  # Download the model file first
   n_ctx=2048,  # The max sequence length to use - note that longer sequence lengths require much more resources
   n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
   n_gpu_layers=0         # The number of layers to offload to GPU, if you have GPU acceleration available

 If you want to use the cpu install these dependencies:
 ```python
+pip install llama-cpp-python hf_hub_download
 ```
 If you want to use the gpu instead:
 ```python
+CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install hf_hub_download llama-cpp-python --force-reinstall --upgrade --no-cache-dir
 ```
 And then use this code to see a response to the prompt.
 ```python
+from huggingface_hub import hf_hub_download
 from llama_cpp import Llama
+model_path = hf_hub_download(
+    repo_id="MoxoffSpA/AzzurroQuantized",
+    filename="Azzurro-ggml-Q4_K_M.gguf"
+)
 # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
 llm = Llama(
+  model_path=model_path,
   n_ctx=2048,  # The max sequence length to use - note that longer sequence lengths require much more resources
   n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
   n_gpu_layers=0         # The number of layers to offload to GPU, if you have GPU acceleration available