legraphista
/

DeepSeek-V2-Lite-IMat-GGUF

@@ -22,6 +22,20 @@ Original dtype: `BF16` (`bfloat16`)
 Quantized by: llama.cpp [https://github.com/ggerganov/llama.cpp/pull/7519](https://github.com/ggerganov/llama.cpp/releases/tag/https://github.com/ggerganov/llama.cpp/pull/7519)
 IMatrix dataset: [here](https://gist.githubusercontent.com/legraphista/d6d93f1a254bcfc58e0af3777eaec41e/raw/d380e7002cea4a51c33fffd47db851942754e7cc/imatrix.calibration.medium.raw)
 ## Files
 ### IMatrix
@@ -64,20 +78,31 @@ Link: [here](https://huggingface.co/legraphista/DeepSeek-V2-Lite-IMat-GGUF/blob/
 ## Downloading using huggingface-cli
-First, make sure you have hugginface-cli installed:
 ```
 pip install -U "huggingface_hub[cli]"
 ```
-Then, you can target the specific file you want:
 ```
 huggingface-cli download legraphista/DeepSeek-V2-Lite-IMat-GGUF --include "DeepSeek-V2-Lite.Q8_0.gguf" --local-dir ./
 ```
-If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
 ```
 huggingface-cli download legraphista/DeepSeek-V2-Lite-IMat-GGUF --include "DeepSeek-V2-Lite.Q8_0/*" --local-dir DeepSeek-V2-Lite.Q8_0
 # see FAQ for merging GGUF's
 ```
 ## FAQ
 ### Why is the IMatrix not applied everywhere?

 Quantized by: llama.cpp [https://github.com/ggerganov/llama.cpp/pull/7519](https://github.com/ggerganov/llama.cpp/releases/tag/https://github.com/ggerganov/llama.cpp/pull/7519)
 IMatrix dataset: [here](https://gist.githubusercontent.com/legraphista/d6d93f1a254bcfc58e0af3777eaec41e/raw/d380e7002cea4a51c33fffd47db851942754e7cc/imatrix.calibration.medium.raw)
+- [DeepSeek-V2-Lite-IMat-GGUF](#deepseek-v2-lite-imat-gguf)
+    - [Files](#files)
+        - [IMatrix](#imatrix)
+        - [Common Quants](#common-quants)
+        - [All Quants](#all-quants)
+    - [Downloading using huggingface-cli](#downloading-using-huggingface-cli)
+    - [Inference](#inference)
+        - [Llama.cpp](#llama-cpp)
+    - [FAQ](#faq)
+        - [Why is the IMatrix not applied everywhere?](#why-is-the-imatrix-not-applied-everywhere)
+        - [How do I merge a split GGUF?](#how-do-i-merge-a-split-gguf)
+---
 ## Files
 ### IMatrix
 ## Downloading using huggingface-cli
+If you do not have hugginface-cli installed:
 ```
 pip install -U "huggingface_hub[cli]"
 ```
+Download the specific file you want:
 ```
 huggingface-cli download legraphista/DeepSeek-V2-Lite-IMat-GGUF --include "DeepSeek-V2-Lite.Q8_0.gguf" --local-dir ./
 ```
+If the model file is big, it has been split into multiple files. In order to download them all to a local folder, run:
 ```
 huggingface-cli download legraphista/DeepSeek-V2-Lite-IMat-GGUF --include "DeepSeek-V2-Lite.Q8_0/*" --local-dir DeepSeek-V2-Lite.Q8_0
 # see FAQ for merging GGUF's
 ```
+---
+## Inference
+### Llama.cpp
+```
+llama.cpp/main -m DeepSeek-V2-Lite.Q8_0.gguf --color -i -p "prompt here"
+```
+---
 ## FAQ
 ### Why is the IMatrix not applied everywhere?