Update README.md

d75ad8c about 1 year ago

5.1 kB

	---
	license: cc-by-sa-4.0
	---

	# StableLM-3B-4E1T #

	* Model Creator: [Stability AI](https://huggingface.co/stabilityai)
	* original Model: [StableLM-3B-4E1T](https://huggingface.co/stabilityai/stablelm-3b-4e1t)

	## Description ##

	This repository contains the most relevant quantizations of Stability AI's
	[StableLM-3B-4E1T](https://huggingface.co/stabilityai/stablelm-3b-4e1t) model
	in GGUF format - ready to be used with
	[llama.cpp](https://github.com/ggerganov/llama.cpp) and similar applications.

	## About StableLM-3B-4E1T ##

	Stability AI claims: "_StableLM-3B-4E1T achieves
	state-of-the-art performance (September 2023) at the 3B parameter scale
	for open-source models and is competitive with many of the popular
	contemporary 7B models, even outperforming our most recent 7B
	StableLM-Base-Alpha-v2._"

	According to them "_The model is intended to be used as a foundational base
	model for application-specific fine-tuning. Developers must evaluate and
	fine-tune the model for safe performance in downstream applications._"

	## Files ##

	Right now, the following quantizations are available:

	* [stablelm-3b-4e1t-Q3_K_M](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q3_K_M.bin)
	* [stablelm-3b-4e1t-Q4_K_M](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q4_K_M.bin)
	* [stablelm-3b-4e1t-Q5_K_M](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q5_K_M.bin)
	* [stablelm-3b-4e1t-Q6_K](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q6_K.bin)
	* [stablelm-3b-4e1t-Q8_K](https://huggingface.co/rozek/StableLM-3B-4E1T_GGUF/blob/main/stablelm-3b-4e1t-Q8_K.bin)

	(tell me if you need more)

	These files are presented here with the written permission of Stability AI (although
	access to the model itself is still "gated").

	## Usage Details ##

	Any technical details can be found on the
	[original model card](https://huggingface.co/stabilityai/stablelm-3b-4e1t) and in
	a paper on [StableLM-3B-4E1T](https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo).
	The most important ones for using this model are

	* context length is 4096
	* there does not seem to be a specific prompt structure - just provide the text
	you want to be completed

	### Text Completion with LLaMA.cpp ###

	For simple inferencing, use a command similar to

	```
	./main -m stablelm-3b-4e1t-Q8_0.bin --temp 0 --top-k 4 --prompt "who was Joseph Weizenbaum?"
	```

	### Text Tokenization with LLaMA.cpp ###

	To get a list of tokens, use a command similar to

	```
	./tokenization -m stablelm-3b-4e1t-Q8_0.bin --prompt "who was Joseph Weizenbaum?"
	```

	### Embeddings Calculation with LLaMA.cpp ###

	Text embeddings are calculated with a command similar to

	```
	./embedding -m stablelm-3b-4e1t-Q8_0.bin --prompt "who was Joseph Weizenbaum?"
	```

	## Conversion Details ##

	Conversion was done using a Docker container based on
	`python:3.10.13-slim-bookworm`

	After downloading the original model files into a separate directory, the
	container was started with

	```
	docker run --interactive \
	--mount type=bind,src=<local-folder>,dst=/llm \
	python:3.10.13-slim-bookworm
	```

	where `<local-folder>` was the path to the folder containing the downloaded
	model.

	Within the container's terminal, the following commands were issued:

	```
	apt-get update
	apt-get install build-essential git -y

	git clone https://github.com/ggerganov/llama.cpp
	cd llama.cpp

	## Important: uncomment the make command that fits to your host computer!
	## on Apple Silicon machines: (see https://github.com/ggerganov/llama.cpp/issues/1655)
	# UNAME_M=arm64 UNAME_p=arm LLAMA_NO_METAL=1 make
	## otherwise
	# make

	python3 -m pip install -r requirements.txt
	pip install torch transformers

	# see https://github.com/ggerganov/llama.cpp/issues/3344
	python3 convert-hf-to-gguf.py /llm
	mv /llm/ggml-model-f16.gguf /llm/stablelm-3b-4e1t.gguf

	# the following command is just an example, modify it as needed
	./quantize /llm/stablelm-3b-4e1t.gguf /llm/stablelm-3b-4e1t_Q3_K_M.gguf q3_k_m
	```

	After conversion, the mounted folder (the one that originally contained the
	model only) now also contains all conversions.

	The container itself may now be safely deleted - the conversions will remain on
	disk.

	## License ##

	The original "_Model checkpoints are licensed under the Creative Commons license
	([CC BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)). Under this
	license, you must give [credit](https://creativecommons.org/licenses/by/4.0/#)
	to Stability AI, provide a link to the license, and
	[indicate if changes were made](https://creativecommons.org/licenses/by/4.0/#).
	You may do so in any reasonable manner, but not in any way that suggests the Stability AI endorses you or your use._"

	So, in order to be fair and give credits to whom they belong:

	* the original model was created and published by [Stability AI](https://huggingface.co/stabilityai)
	* besides quantization, no changes were applied to the model itself