|
--- |
|
license: cc-by-sa-4.0 |
|
--- |
|
|
|
# StableLM-3B-4E1T # |
|
|
|
* Model Creator: [Stability AI](https://huggingface.co/stabilityai) |
|
* original Model: [StableLM-3B-4E1T](https://huggingface.co/stabilityai/stablelm-3b-4e1t) |
|
|
|
## Description ## |
|
|
|
This repository contains the most relevant quantizations of Stability AI's |
|
[StableLM-3B-4E1T](https://huggingface.co/stabilityai/stablelm-3b-4e1t) model |
|
in GGUF format - ready to be used with |
|
[llama.cpp](https://github.com/ggerganov/llama.cpp) and similar applications. |
|
|
|
## About StableLM-3B-4E1T ## |
|
|
|
Stability AI claims: "_StableLM-3B-4E1T achieves |
|
state-of-the-art performance (September 2023) at the 3B parameter scale |
|
for open-source models and is competitive with many of the popular |
|
contemporary 7B models, even outperforming our most recent 7B |
|
StableLM-Base-Alpha-v2._" |
|
|
|
According to them "_The model is intended to be used as a foundational base |
|
model for application-specific fine-tuning. Developers must evaluate and |
|
fine-tune the model for safe performance in downstream applications._" |
|
|
|
## Files ## |
|
|
|
Right now, the following quantizations are available: |
|
|
|
* [stablelm-3b-4e1t-Q3_K_M](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q3_K_M.bin) |
|
* [stablelm-3b-4e1t-Q4_K_M](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q4_K_M.bin) |
|
* [stablelm-3b-4e1t-Q5_K_M](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q5_K_M.bin) |
|
* [stablelm-3b-4e1t-Q6_K](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q6_K.bin) |
|
* [stablelm-3b-4e1t-Q8_K](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q8_K.bin) |
|
|
|
(tell me if you need more) |
|
|
|
These files are presented here with the written permission of Stability AI (although |
|
access to the model itself is still "gated"). |
|
|
|
## Usage Details ## |
|
|
|
Any technical details can be found on the |
|
[original model card](https://huggingface.co/stabilityai/stablelm-3b-4e1t) and in |
|
a paper on [StableLM-3B-4E1T](https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo). |
|
The most important ones for using this model are |
|
|
|
* context length is 4096 |
|
* there does not seem to be a specific prompt structure - just provide the text |
|
you want to be completed |
|
|
|
### Text Completion with LLaMA.cpp ### |
|
|
|
For simple inferencing, use a command similar to |
|
|
|
``` |
|
./main -m stablelm-3b-4e1t-Q8_0.bin --temp 0 --top-k 4 --prompt "who was Joseph Weizenbaum?" |
|
``` |
|
|
|
### Text Tokenization with LLaMA.cpp ### |
|
|
|
To get a list of tokens, use a command similar to |
|
|
|
``` |
|
./tokenization -m stablelm-3b-4e1t-Q8_0.bin --prompt "who was Joseph Weizenbaum?" |
|
``` |
|
|
|
### Embeddings Calculation with LLaMA.cpp ### |
|
|
|
Text embeddings are calculated with a command similar to |
|
|
|
``` |
|
./embedding -m stablelm-3b-4e1t-Q8_0.bin --prompt "who was Joseph Weizenbaum?" |
|
``` |
|
|
|
## Conversion Details ## |
|
|
|
Conversion was done using a Docker container based on |
|
`python:3.10.13-slim-bookworm` |
|
|
|
After downloading the original model files into a separate directory, the |
|
container was started with |
|
|
|
``` |
|
docker run --interactive \ |
|
--mount type=bind,src=<local-folder>,dst=/llm \ |
|
python:3.10.13-slim-bookworm |
|
``` |
|
|
|
where `<local-folder>` was the path to the folder containing the downloaded |
|
model. |
|
|
|
Within the container's terminal, the following commands were issued: |
|
|
|
``` |
|
apt-get update |
|
apt-get install build-essential git -y |
|
|
|
git clone https://github.com/ggerganov/llama.cpp |
|
cd llama.cpp |
|
|
|
## Important: uncomment the make command that fits to your host computer! |
|
## on Apple Silicon machines: (see https://github.com/ggerganov/llama.cpp/issues/1655) |
|
# UNAME_M=arm64 UNAME_p=arm LLAMA_NO_METAL=1 make |
|
## otherwise |
|
# make |
|
|
|
python3 -m pip install -r requirements.txt |
|
pip install torch transformers |
|
|
|
# see https://github.com/ggerganov/llama.cpp/issues/3344 |
|
python3 convert-hf-to-gguf.py /llm |
|
mv /llm/ggml-model-f16.gguf /llm/stablelm-3b-4e1t.gguf |
|
|
|
# the following command is just an example, modify it as needed |
|
./quantize /llm/stablelm-3b-4e1t.gguf /llm/stablelm-3b-4e1t_Q3_K_M.gguf q3_k_m |
|
``` |
|
|
|
After conversion, the mounted folder (the one that originally contained the |
|
model only) now also contains all conversions. |
|
|
|
The container itself may now be safely deleted - the conversions will remain on |
|
disk. |
|
|
|
## License ## |
|
|
|
The original "_Model checkpoints are licensed under the Creative Commons license |
|
([CC BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)). Under this |
|
license, you must give [credit](https://creativecommons.org/licenses/by/4.0/#) |
|
to Stability AI, provide a link to the license, and |
|
[indicate if changes were made](https://creativecommons.org/licenses/by/4.0/#). |
|
You may do so in any reasonable manner, but not in any way that suggests the Stability AI endorses you or your use._" |
|
|
|
So, in order to be fair and give credits to whom they belong: |
|
|
|
* the original model was created and published by [Stability AI](https://huggingface.co/stabilityai) |
|
* besides quantization, no changes were applied to the model itself |
|
|