Update README.md
Browse files
README.md
CHANGED
@@ -50,6 +50,68 @@ Quantization is the process of reducing the precision of model weights, which de
|
|
50 |
- **Q8_0**: Highest quality on GPU, smallest quality decrease compared to the original
|
51 |
- **F16/BF16**: Full precision, reference versions without quantization
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
## How to run the model
|
54 |
|
55 |
### Using llama.cpp
|
|
|
50 |
- **Q8_0**: Highest quality on GPU, smallest quality decrease compared to the original
|
51 |
- **F16/BF16**: Full precision, reference versions without quantization
|
52 |
|
53 |
+
# Downloading the model using huggingface-cli
|
54 |
+
|
55 |
+
<details>
|
56 |
+
<summary>Click to see download instructions</summary>
|
57 |
+
|
58 |
+
First, make sure you have the huggingface-cli tool installed:
|
59 |
+
```bash
|
60 |
+
pip install -U "huggingface_hub[cli]"
|
61 |
+
```
|
62 |
+
|
63 |
+
### Downloading smaller models
|
64 |
+
To download a specific model smaller than 50GB (e.g., q4_k_m):
|
65 |
+
```bash
|
66 |
+
huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q4_k_m.gguf" --local-dir ./
|
67 |
+
```
|
68 |
+
|
69 |
+
You can also download other quantizations by changing the filename:
|
70 |
+
```bash
|
71 |
+
# For q3_k_m version (22.5 GB)
|
72 |
+
huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q3_k_m.gguf" --local-dir ./
|
73 |
+
|
74 |
+
# For iq3_s version (20.4 GB)
|
75 |
+
huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-iq3_s.gguf" --local-dir ./
|
76 |
+
|
77 |
+
# For q5_k_m version (33.2 GB)
|
78 |
+
huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q5_k_m.gguf" --local-dir ./
|
79 |
+
```
|
80 |
+
|
81 |
+
### Downloading larger models (split into parts)
|
82 |
+
For large models, such as F16 or bf16, files are split into smaller parts. To download all parts to a local folder:
|
83 |
+
|
84 |
+
```bash
|
85 |
+
# For F16 version (~85 GB)
|
86 |
+
huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-F16/*" --local-dir ./F16/
|
87 |
+
|
88 |
+
# For bf16 version (~85 GB)
|
89 |
+
huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-bf16/*" --local-dir ./bf16/
|
90 |
+
```
|
91 |
+
|
92 |
+
### Faster downloads with hf_transfer
|
93 |
+
To significantly speed up downloading (up to 1GB/s), you can use the hf_transfer library:
|
94 |
+
|
95 |
+
```bash
|
96 |
+
# Install hf_transfer
|
97 |
+
pip install hf_transfer
|
98 |
+
|
99 |
+
# Download with hf_transfer enabled (much faster)
|
100 |
+
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q4_k_m.gguf" --local-dir ./
|
101 |
+
```
|
102 |
+
|
103 |
+
### Joining split files after downloading
|
104 |
+
If you downloaded a split model, you can join it using:
|
105 |
+
|
106 |
+
```bash
|
107 |
+
# On Linux/Mac systems
|
108 |
+
cat PLLuM-8x7B-chat-gguf-F16.part-* > PLLuM-8x7B-chat-gguf-F16.gguf
|
109 |
+
|
110 |
+
# On Windows systems
|
111 |
+
copy /b PLLuM-8x7B-chat-gguf-F16.part-* PLLuM-8x7B-chat-gguf-F16.gguf
|
112 |
+
```
|
113 |
+
</details>
|
114 |
+
|
115 |
## How to run the model
|
116 |
|
117 |
### Using llama.cpp
|