TriLM - llamafile

This is a 1.58 bit ternary LLM whose weights consist of {-1, 0, +1}. It's highly optimized for CPU performance, thanks to the Q2_K_S quantization format.

Model creator: SpectraSuite
Original model: TriLMs-Unpacked

This repository packages and distributes TriLM as executable weights, which we call llamafiles. The files you download here will run on Linux, MacOS, Windows, FreeBSD, OpenBSD, and NetBSD for AMD64 and ARM64.

Quickstart

Running the following on a desktop OS will launch a tab in your web browser with a completions interface.

wget https://huggingface.co/Mozilla/TriLM-llamafile/resolve/main/TriLM_3.9B.llamafile
chmod +x TriLM_3.9B.llamafile
./TriLM_3.9B.llamafile

You can also use the command line interface:

./TriLM_3.9B.llamafile -p "this is my prompt"

For further information, please see the llamafile README.

Having trouble? See the "Gotchas" section of the README.

Prompting

This is a base model. It hasn't been fine-tuned for chat. It's recommended that the completions interface be used.

It's recommended with the smaller TriLM models (e.g. 99M) that a high repeat penalty be set, e.g. --repeat-penalty 10. When using the CLI mode, this flag is specified by default in the .args file embedded within the llamafiles from this repo.

Benchmarks

cpu_info	model_filename	size	test	t/s
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_3.9B.llamafile	1.31 GiB	pp512	1069.54
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_3.9B.llamafile	1.31 GiB	tg16	88.47
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_2.4B.llamafile	837.02 MiB	pp512	1441.04
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_2.4B.llamafile	837.02 MiB	tg16	110.80
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_1.5B.llamafile	531.44 MiB	pp512	2185.94
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_1.5B.llamafile	531.44 MiB	tg16	154.59
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_1.1B.llamafile	408.66 MiB	pp512	2692.87
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_1.1B.llamafile	408.66 MiB	tg16	173.08
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_830M.llamafile	301.76 MiB	pp512	3353.51
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_830M.llamafile	301.76 MiB	tg16	191.98
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_560M.llamafile	211.21 MiB	pp512	4297.08
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_560M.llamafile	211.21 MiB	tg16	209.57
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_390M.llamafile	148.93 MiB	pp512	5130.90
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_390M.llamafile	148.93 MiB	tg16	221.88
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_99M.llamafile	148.93 MiB	pp512	5127.00
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_99M.llamafile	148.93 MiB	tg16	218.93
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_190M.llamafile	78.55 MiB	pp512	10874.11
AMD Ryzen Threadripper PRO 7995WX (znver4)	TriLM_190M.llamafile	78.55 MiB	tg16	334.45
Apple M2 Ultra (+fp16+dotprod)	TriLM_3.9B.llamafile	1.31 GiB	pp512	227.95
Apple M2 Ultra (+fp16+dotprod)	TriLM_3.9B.llamafile	1.31 GiB	tg16	65.17
Apple M2 Ultra (+fp16+dotprod)	TriLM_2.4B.llamafile	837.02 MiB	pp512	347.93
Apple M2 Ultra (+fp16+dotprod)	TriLM_2.4B.llamafile	837.02 MiB	tg16	48.26
Apple M2 Ultra (+fp16+dotprod)	TriLM_1.5B.llamafile	531.44 MiB	pp512	588.86
Apple M2 Ultra (+fp16+dotprod)	TriLM_1.5B.llamafile	531.44 MiB	tg16	140.22
Apple M2 Ultra (+fp16+dotprod)	TriLM_1.1B.llamafile	408.66 MiB	pp512	767.47
Apple M2 Ultra (+fp16+dotprod)	TriLM_1.1B.llamafile	408.66 MiB	tg16	167.80
Apple M2 Ultra (+fp16+dotprod)	TriLM_830M.llamafile	301.76 MiB	pp512	1031.20
Apple M2 Ultra (+fp16+dotprod)	TriLM_830M.llamafile	301.76 MiB	tg16	204.46
Apple M2 Ultra (+fp16+dotprod)	TriLM_560M.llamafile	211.21 MiB	pp512	1487.29
Apple M2 Ultra (+fp16+dotprod)	TriLM_560M.llamafile	211.21 MiB	tg16	245.53
Apple M2 Ultra (+fp16+dotprod)	TriLM_390M.llamafile	148.93 MiB	pp512	2049.02
Apple M2 Ultra (+fp16+dotprod)	TriLM_390M.llamafile	148.93 MiB	tg16	332.24
Apple M2 Ultra (+fp16+dotprod)	TriLM_99M.llamafile	148.93 MiB	pp512	2103.34
Apple M2 Ultra (+fp16+dotprod)	TriLM_99M.llamafile	148.93 MiB	tg16	301.31
Apple M2 Ultra (+fp16+dotprod)	TriLM_190M.llamafile	78.55 MiB	pp512	4762.49
Apple M2 Ultra (+fp16+dotprod)	TriLM_190M.llamafile	78.55 MiB	tg16	553.83
Intel Core i9-14900K (alderlake)	TriLM_3.9B.llamafile	1.31 GiB	pp512	167.15
Intel Core i9-14900K (alderlake)	TriLM_3.9B.llamafile	1.31 GiB	tg16	53.22
Intel Core i9-14900K (alderlake)	TriLM_2.4B.llamafile	837.02 MiB	pp512	261.73
Intel Core i9-14900K (alderlake)	TriLM_2.4B.llamafile	837.02 MiB	tg16	78.39
Intel Core i9-14900K (alderlake)	TriLM_1.5B.llamafile	531.44 MiB	pp512	426.17
Intel Core i9-14900K (alderlake)	TriLM_1.5B.llamafile	531.44 MiB	tg16	123.91
Intel Core i9-14900K (alderlake)	TriLM_1.1B.llamafile	408.66 MiB	pp512	563.58
Intel Core i9-14900K (alderlake)	TriLM_1.1B.llamafile	408.66 MiB	tg16	159.13
Intel Core i9-14900K (alderlake)	TriLM_830M.llamafile	301.76 MiB	pp512	763.27
Intel Core i9-14900K (alderlake)	TriLM_830M.llamafile	301.76 MiB	tg16	209.42
Intel Core i9-14900K (alderlake)	TriLM_560M.llamafile	211.21 MiB	pp512	1116.30
Intel Core i9-14900K (alderlake)	TriLM_560M.llamafile	211.21 MiB	tg16	295.71
Intel Core i9-14900K (alderlake)	TriLM_390M.llamafile	148.93 MiB	pp512	1586.69
Intel Core i9-14900K (alderlake)	TriLM_390M.llamafile	148.93 MiB	tg16	377.50
Intel Core i9-14900K (alderlake)	TriLM_99M.llamafile	148.93 MiB	pp512	1587.38
Intel Core i9-14900K (alderlake)	TriLM_99M.llamafile	148.93 MiB	tg16	401.37
Intel Core i9-14900K (alderlake)	TriLM_190M.llamafile	78.55 MiB	pp512	3713.16
Intel Core i9-14900K (alderlake)	TriLM_190M.llamafile	78.55 MiB	tg16	845.54
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_3.9B.llamafile	1.31 GiB	pp512	17.02
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_3.9B.llamafile	1.31 GiB	tg16	6.67
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_2.4B.llamafile	837.02 MiB	pp512	26.35
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_2.4B.llamafile	837.02 MiB	tg16	10.52
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_1.5B.llamafile	531.44 MiB	pp512	42.52
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_1.5B.llamafile	531.44 MiB	tg16	16.91
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_1.1B.llamafile	408.66 MiB	pp512	56.57
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_1.1B.llamafile	408.66 MiB	tg16	20.54
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_390M.llamafile	148.93 MiB	pp512	146.67
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_390M.llamafile	148.93 MiB	tg16	56.77
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_99M.llamafile	148.93 MiB	pp512	147.65
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_99M.llamafile	148.93 MiB	tg16	58.24
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_190M.llamafile	78.55 MiB	pp512	338.42
Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod)	TriLM_190M.llamafile	78.55 MiB	tg16	107.33

About llamafile

llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023. It uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp binaries that run on the stock installs of six OSes for both ARM64 and AMD64.

TriLM 3.9B Unpacked

TriLM (ternary model), unpacked to FP16 format - compatible with FP16 GEMMs. After unpacking, TriLM has the same architecture as LLaMa.

import transformers as tf, torch
model_name = "SpectraSuite/TriLM_3.9B_Unpacked"

# Please adjust the temperature, repetition penalty, top_k, top_p and other sampling parameters according to your needs.
pipeline = tf.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.float16}, device_map="auto")

# These are base (pretrained) LLMs that are not instruction and chat tuned. You may need to adjust your prompt accordingly.
pipeline("Once upon a time")

License: Apache 2.0
We will use our GitHub repo for communication (including HF repo related queries). Feel free to open an issue here https://github.com/NolanoOrg/SpectraSuite