piotrmaciejbednarski
/

PLLuM-8x7B-chat-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

piotrmaciejbednarski commited on 9 days ago

Commit

5517248

·

verified ·

1 Parent(s): 466099f

Update README.md

Files changed (1) hide show

README.md +62 -0

README.md CHANGED Viewed

@@ -50,6 +50,68 @@ Quantization is the process of reducing the precision of model weights, which de
 - **Q8_0**: Highest quality on GPU, smallest quality decrease compared to the original
 - **F16/BF16**: Full precision, reference versions without quantization
 ## How to run the model
 ### Using llama.cpp

 - **Q8_0**: Highest quality on GPU, smallest quality decrease compared to the original
 - **F16/BF16**: Full precision, reference versions without quantization
+# Downloading the model using huggingface-cli
+<details>
+  <summary>Click to see download instructions</summary>
+First, make sure you have the huggingface-cli tool installed:
+```bash
+pip install -U "huggingface_hub[cli]"
+```
+### Downloading smaller models
+To download a specific model smaller than 50GB (e.g., q4_k_m):
+```bash
+huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q4_k_m.gguf" --local-dir ./
+```
+You can also download other quantizations by changing the filename:
+```bash
+# For q3_k_m version (22.5 GB)
+huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q3_k_m.gguf" --local-dir ./
+# For iq3_s version (20.4 GB)
+huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-iq3_s.gguf" --local-dir ./
+# For q5_k_m version (33.2 GB)
+huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q5_k_m.gguf" --local-dir ./
+```
+### Downloading larger models (split into parts)
+For large models, such as F16 or bf16, files are split into smaller parts. To download all parts to a local folder:
+```bash
+# For F16 version (~85 GB)
+huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-F16/*" --local-dir ./F16/
+# For bf16 version (~85 GB)
+huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-bf16/*" --local-dir ./bf16/
+```
+### Faster downloads with hf_transfer
+To significantly speed up downloading (up to 1GB/s), you can use the hf_transfer library:
+```bash
+# Install hf_transfer
+pip install hf_transfer
+# Download with hf_transfer enabled (much faster)
+HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download piotrmaciejbednarski/PLLuM-8x7B-chat-GGUF --include "PLLuM-8x7B-chat-gguf-q4_k_m.gguf" --local-dir ./
+```
+### Joining split files after downloading
+If you downloaded a split model, you can join it using:
+```bash
+# On Linux/Mac systems
+cat PLLuM-8x7B-chat-gguf-F16.part-* > PLLuM-8x7B-chat-gguf-F16.gguf
+# On Windows systems
+copy /b PLLuM-8x7B-chat-gguf-F16.part-* PLLuM-8x7B-chat-gguf-F16.gguf
+```
+</details>
 ## How to run the model
 ### Using llama.cpp