rozek
/

StableLM-3B-4E1T_GGUF

Inference Endpoints

Model card Files Files and versions Community

rozek commited on Nov 22, 2023

Commit

4304dab

·

1 Parent(s): d3ff5ae

Update README.md

Files changed (1) hide show

README.md +49 -0

README.md CHANGED Viewed

@@ -30,6 +30,7 @@ fine-tune the model for safe performance in downstream applications._"
 Right now, the following quantizations are available:
 * [stablelm-3b-4e1t-Q4_K_M](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q4_K_M.bin)
 * [stablelm-3b-4e1t-Q5_K_M](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q5_K_M.bin)
 * [stablelm-3b-4e1t-Q6_K](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q6_K.bin)
@@ -75,6 +76,54 @@ Text embeddings are calculated with a command similar to
 ./embedding -m stablelm-3b-4e1t-Q8_0.bin --prompt "who was Joseph Weizenbaum?"
 ```
 ## License ##

 Right now, the following quantizations are available:
+* [stablelm-3b-4e1t-Q3_K_M](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q3_K_M.bin)
 * [stablelm-3b-4e1t-Q4_K_M](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q4_K_M.bin)
 * [stablelm-3b-4e1t-Q5_K_M](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q5_K_M.bin)
 * [stablelm-3b-4e1t-Q6_K](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q6_K.bin)
 ./embedding -m stablelm-3b-4e1t-Q8_0.bin --prompt "who was Joseph Weizenbaum?"
 ```
+## Conversion Details ##
+Conversion was done using a Docker container based on
+`python:3.10.13-slim-bookworm`
+After downloading the original model files into a separate directory, the
+container was started with
+```
+docker run --interactive \
+  --mount type=bind,src=<local-folder>,dst=/llm \
+  python:3.10.13-slim-bookworm
+```
+where `<local-folder>` was the path to the folder containing the downloaded
+model.
+With the container's terminal, the following commands were issued:
+```
+apt-get update
+apt-get install build-essential git -y
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp
+## Important: uncomment the make command that fits to your host computer!
+## on Apple Silicon machines: (see https://github.com/ggerganov/llama.cpp/issues/1655)
+# UNAME_M=arm64 UNAME_p=arm LLAMA_NO_METAL=1 make
+## otherwise
+# make
+python3 -m pip install -r requirements.txt
+pip install torch transformers
+# see https://github.com/ggerganov/llama.cpp/issues/3344
+python3 convert-hf-to-gguf.py /llm
+mv /llm/ggml-model-f16.gguf /llm/stablelm-3b-4e1t.gguf
+# the following command is just an example, modify it as needed
+./quantize /llm/stablelm-3b-4e1t.gguf /llm/stablelm-3b-4e1t_Q3_K_M.gguf q3_k_m
+```
+After conversion, the mounted folder (the one that originally contained the
+model only) now also contains all conversions.
+The container itself may now be safely deleted - the conversions will remain on
+disk.
 ## License ##