|
--- |
|
inference: true |
|
language: |
|
- en |
|
license: mit |
|
model_creator: Mads Havmand |
|
model_name: minillama |
|
model_type: llama |
|
quantized_by: Havmand |
|
tags: |
|
- llama |
|
- test |
|
- development |
|
--- |
|
|
|
# minillama |
|
- Model creator: [Mads Havmand](https://huggingface.co/Havmand) |
|
|
|
## Description |
|
|
|
minillama is a minimal Large Language Model using the Llama architecture and distributed in the GGUF format. |
|
|
|
The purpose of the model is to be small and technically qualify as a model that can be loaded with llama.cpp without causing an error. |
|
I originally created this model because I needed a small model for my unit tests of Python code that used llama-cpp-python. |
|
|
|
The model __can technically__ be used for inference, but the output produced is a close to useless as you can get. |
|
Tokens per second is nice though, at around 1000 tokens per second on an Apple M2 Pro. |
|
|
|
To reduce file size, the model is quantized using Q2_K. |
|
|
|
The model contains 4.26 million parameters and is 3.26 MiB. |
|
|
|
As for the vocabulary, the model uses the llama vocabulary provided by [llama.cpp](https://github.com/ggerganov/llama.cpp/blob/97c1549808d2742d37584a3c9df28154bdf34417/models/ggml-vocab-llama.gguf) (SHA512: `38a5acf305050422882044df0acc97e5ae992ed19b2838b3b58ebbbb1f61c59bfc12a6f686a724aed32227045806e4dd46aadf9822155d1169455fa56d38fbc2`) |
|
|
|
The training corpus consists of a space and a newline: |
|
|
|
```hexdump |
|
00000000 20 0a | .| |
|
00000002 |
|
``` |
|
|
|
Finally, the model was build using llama.cpp's `train-text-from-scratch` (from commit [97c1549808d2742d37584a3c9df28154bdf34417](https://github.com/ggerganov/llama.cpp/tree/97c1549808d2742d37584a3c9df28154bdf34417)). The command used was: |
|
|
|
```sh |
|
./train-text-from-scratch \ |
|
--vocab-model models/ggml-vocab-llama.gguf \ |
|
--ctx 1 --embd 64 --head 1 --layer 1 \ |
|
--checkpoint-in chk-minillama-LATEST.gguf \ |
|
--checkpoint-out chk-minillama-ITERATION.gguf \ |
|
--model-out ggml-minillama-f32-ITERATION.gguf \ |
|
--train-data "training.txt" \ |
|
-t 6 -b 16 --seed 1 --adam-iter 1 \ |
|
--no-checkpointing |
|
``` |
|
|
|
Quantization happened using `./quantize ggml-minillama-f32-LATEST.gguf 10`. |
|
|
|
These files were quantized using hardware kindly provided by me. |
|
|
|
|