Update README.md
Browse files
README.md
CHANGED
@@ -10,9 +10,19 @@ datasets:
|
|
10 |
- EleutherAI/the_pile_deduplicated
|
11 |
---
|
12 |
|
|
|
13 |
### This repository contains quantized conversions of EleutherAI's Pythia Deduped checkpoints.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
-
[Click here if you're looking for ggmlv1 and ggmlv2 models.](https://huggingface.co/Merry/ggml-pythia-deduped/tree/a695a4c30c01ed9a41200c01f85d47c819fc93dd).
|
16 |
|
17 |
# RAM USAGE
|
18 |
Model | RAM usage
|
@@ -33,15 +43,4 @@ ggmlv3-pythia-1b-deduped-q5_1.bin | 943.9 MiB
|
|
33 |
ggmlv3-pythia-1.4b-deduped-q5_1.bin | 1.3 GiB
|
34 |
ggmlv3-pythia-2.8b-deduped-q5_1.bin | 2.3 GiB
|
35 |
|
36 |
-
*Tested on KoboldCpp with OpenBLAS enabled.*
|
37 |
-
**Notes:**
|
38 |
-
- The models have been converted with ggerganov/ggml's gpt-neox conversion script, and tested only on KoboldCpp. Other frontends that support GGML-based conversions of GPT-NeoX *should* work, but I can't promise anything.
|
39 |
-
- They're sorted by date based on when they were converted so it was easier to track breaking changes. If you're just starting off I highly recommend the latest, which is currently 2023-05-25. Combined with KoboldCpp v1.25.1+ this improved the tokenizer, which in my testing reduces occurrences of broken words like "Alicae" or "Reimu Hai-ku-rei".
|
40 |
-
|
41 |
-
# ALTERNATIVES
|
42 |
-
If you're here because you want a smaller model to run on a device with constrained memory, consider the following, most (if not all) of which have GGML conversions available:
|
43 |
-
- [**RedPajama-INCITE**](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) (3B, 7B), using the GPT-NeoX architecture
|
44 |
-
- [**OpenLLaMA**](https://huggingface.co/openlm-research/open_llama_3b_600bt_preview) (3B, 7B), using the LLaMA architecture
|
45 |
-
- [**MPT-1b-RedPajama-200b**](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) (1B), using the MPT architecture
|
46 |
-
- [**RWKV-4 PilePlus**](https://huggingface.co/BlinkDL/rwkv-4-pileplus) (169M, 430M, 1.5B, 3B), using the RWKV architecture
|
47 |
-
- [**GPT-2**](https://huggingface.co/gpt2-xl) (124M, 355M, 774M, 1.5B), using the GPT-2 architecture
|
|
|
10 |
- EleutherAI/the_pile_deduplicated
|
11 |
---
|
12 |
|
13 |
+
# Pythia Deduped Series GGML
|
14 |
### This repository contains quantized conversions of EleutherAI's Pythia Deduped checkpoints.
|
15 |
+
*For use with frontends that support GGML quantized GPT-NeoX models, such as KoboldCpp and Oobabooga (with the CTransformers loader).*
|
16 |
+
|
17 |
+
*Last updated on 2023-05-25.*
|
18 |
+
|
19 |
+
For other versions of the models, see here:
|
20 |
+
- [GGMLv1 q4_3](https://huggingface.co/Crataco/Pythia-Deduped-Series-GGML/tree/a695a4c30c01ed9a41200c01f85d47c819fc93dd/2023-04-20) (70M to 12B)
|
21 |
+
- [GGMLv1 q5_0 / q5_1 / q8_0](https://huggingface.co/Crataco/Pythia-Deduped-Series-GGML/tree/a695a4c30c01ed9a41200c01f85d47c819fc93dd/2023-04-30) (70M to 2.8B)
|
22 |
+
- [GGMLv1 q4_0 / q4_2](https://huggingface.co/Crataco/Pythia-Deduped-Series-GGML/tree/a695a4c30c01ed9a41200c01f85d47c819fc93dd/2023-05-06) (70M to 2.8B)
|
23 |
+
- [GGMLv2 q4_0 / q5_1](https://huggingface.co/Crataco/Pythia-Deduped-Series-GGML/tree/a695a4c30c01ed9a41200c01f85d47c819fc93dd/2023-05-15) (70M to 2.8B)
|
24 |
+
- [GGMLv3 q4_0 / q5_1](https://huggingface.co/Crataco/Pythia-Deduped-Series-GGML/tree/main)
|
25 |
|
|
|
26 |
|
27 |
# RAM USAGE
|
28 |
Model | RAM usage
|
|
|
43 |
ggmlv3-pythia-1.4b-deduped-q5_1.bin | 1.3 GiB
|
44 |
ggmlv3-pythia-2.8b-deduped-q5_1.bin | 2.3 GiB
|
45 |
|
46 |
+
*Tested on KoboldCpp with OpenBLAS enabled.*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|