piotrmaciejbednarski commited on
Commit
5517248
·
verified ·
1 Parent(s): 466099f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md CHANGED
@@ -50,6 +50,68 @@ Quantization is the process of reducing the precision of model weights, which de
50
  - **Q8_0**: Highest quality on GPU, smallest quality decrease compared to the original
51
  - **F16/BF16**: Full precision, reference versions without quantization
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ## How to run the model
54
 
55
  ### Using llama.cpp
 
50
  - **Q8_0**: Highest quality on GPU, smallest quality decrease compared to the original
51
  - **F16/BF16**: Full precision, reference versions without quantization
52
 
53
+ # Downloading the model using huggingface-cli
54
+
55
+ <details>
56
+ <summary>Click to see download instructions</summary>
57
+
58
+ First, make sure you have the huggingface-cli tool installed:
59
+ ```bash
60
+ pip install -U "huggingface_hub[cli]"
61
+ ```
62
+
63
+ ### Downloading smaller models
64
+ To download a specific model smaller than 50GB (e.g., q4_k_m):
65
+ ```bash
66
+ huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q4_k_m.gguf" --local-dir ./
67
+ ```
68
+
69
+ You can also download other quantizations by changing the filename:
70
+ ```bash
71
+ # For q3_k_m version (22.5 GB)
72
+ huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q3_k_m.gguf" --local-dir ./
73
+
74
+ # For iq3_s version (20.4 GB)
75
+ huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-iq3_s.gguf" --local-dir ./
76
+
77
+ # For q5_k_m version (33.2 GB)
78
+ huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q5_k_m.gguf" --local-dir ./
79
+ ```
80
+
81
+ ### Downloading larger models (split into parts)
82
+ For large models, such as F16 or bf16, files are split into smaller parts. To download all parts to a local folder:
83
+
84
+ ```bash
85
+ # For F16 version (~85 GB)
86
+ huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-F16/*" --local-dir ./F16/
87
+
88
+ # For bf16 version (~85 GB)
89
+ huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-bf16/*" --local-dir ./bf16/
90
+ ```
91
+
92
+ ### Faster downloads with hf_transfer
93
+ To significantly speed up downloading (up to 1GB/s), you can use the hf_transfer library:
94
+
95
+ ```bash
96
+ # Install hf_transfer
97
+ pip install hf_transfer
98
+
99
+ # Download with hf_transfer enabled (much faster)
100
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q4_k_m.gguf" --local-dir ./
101
+ ```
102
+
103
+ ### Joining split files after downloading
104
+ If you downloaded a split model, you can join it using:
105
+
106
+ ```bash
107
+ # On Linux/Mac systems
108
+ cat PLLuM-8x7B-chat-gguf-F16.part-* > PLLuM-8x7B-chat-gguf-F16.gguf
109
+
110
+ # On Windows systems
111
+ copy /b PLLuM-8x7B-chat-gguf-F16.part-* PLLuM-8x7B-chat-gguf-F16.gguf
112
+ ```
113
+ </details>
114
+
115
  ## How to run the model
116
 
117
  ### Using llama.cpp