Initial GGUF model commit
Browse files
README.md
CHANGED
@@ -43,16 +43,16 @@ GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is
|
|
43 |
The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata. It also includes significantly improved tokenization code, including for the first time full support for special tokens. This should improve performance, especially with models that use new special tokens and implement custom prompt templates.
|
44 |
|
45 |
As of August 25th, here is a list of clients and libraries that are known to support GGUF:
|
46 |
-
* [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
47 |
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI. Supports GGUF with GPU acceleration via the ctransformers backend - llama-cpp-python backend should work soon too.
|
48 |
* [KoboldCpp](https://github.com/LostRuins/koboldcpp), now supports GGUF as of release 1.41! A powerful GGML web UI, with full GPU accel. Especially good for story telling.
|
|
|
49 |
* [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), should now work, choose the `c_transformers` backend. A great web UI with many interesting features. Supports CUDA GPU acceleration.
|
50 |
* [ctransformers](https://github.com/marella/ctransformers), now supports GGUF as of version 0.2.24! A Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
|
51 |
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), supports GGUF as of version 0.1.79. A Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
|
52 |
* [candle](https://github.com/huggingface/candle), added GGUF support on August 22nd. Candle is a Rust ML framework with a focus on performance, including GPU support, and ease of use.
|
53 |
|
54 |
The clients and libraries below are expecting to add GGUF support shortly:
|
55 |
-
* [LM Studio](https://lmstudio.ai/), should be updated by end August 25th.
|
56 |
<!-- README_GGUF.md-about-gguf end -->
|
57 |
|
58 |
<!-- repositories-available start -->
|
@@ -104,54 +104,20 @@ Refer to the Provided Files table below to see what files use which methods, and
|
|
104 |
|
105 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
106 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
|
|
107 |
| [airoboros-l2-70b-2.1.Q2_K.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q2_K.gguf) | Q2_K | 2 | 29.28 GB| 31.78 GB | smallest, significant quality loss - not recommended for most purposes |
|
108 |
| [airoboros-l2-70b-2.1.Q3_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_S.gguf) | Q3_K_S | 3 | 29.92 GB| 32.42 GB | very small, high quality loss |
|
109 |
| [airoboros-l2-70b-2.1.Q3_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_M.gguf) | Q3_K_M | 3 | 33.19 GB| 35.69 GB | very small, high quality loss |
|
110 |
| [airoboros-l2-70b-2.1.Q3_K_L.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_L.gguf) | Q3_K_L | 3 | 36.15 GB| 38.65 GB | small, substantial quality loss |
|
|
|
|
|
|
|
111 |
| [airoboros-l2-70b-2.1.Q4_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q4_K_S.gguf) | Q4_K_S | 4 | 39.07 GB| 41.57 GB | small, greater quality loss |
|
112 |
| [airoboros-l2-70b-2.1.Q4_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q4_K_M.gguf) | Q4_K_M | 4 | 41.42 GB| 43.92 GB | medium, balanced quality - recommended |
|
113 |
| [airoboros-l2-70b-2.1.Q5_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q5_K_S.gguf) | Q5_K_S | 5 | 47.46 GB| 49.96 GB | large, low quality loss - recommended |
|
114 |
| [airoboros-l2-70b-2.1.Q5_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q5_K_M.gguf) | Q5_K_M | 5 | 48.75 GB| 51.25 GB | large, very low quality loss - recommended |
|
115 |
-
| airoboros-l2-70b-2.1.Q6_K.gguf | q6_K | 6 | 56.82 GB | 59.32 GB | very large, extremely low quality loss |
|
116 |
-
| airoboros-l2-70b-2.1.Q8_0.gguf | q8_0 | 8 | 73.29 GB | 75.79 GB | very large, extremely low quality loss - not recommended |
|
117 |
|
118 |
**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
119 |
-
|
120 |
-
### Q6_K and Q8_0 files are split and require joining
|
121 |
-
|
122 |
-
**Note:** HF does not support uploading files larger than 50GB. Therefore I have uploaded the Q6_K and Q8_0 files as split files.
|
123 |
-
|
124 |
-
<details>
|
125 |
-
<summary>Click for instructions regarding Q6_K and Q8_0 files</summary>
|
126 |
-
|
127 |
-
### q6_K
|
128 |
-
Please download:
|
129 |
-
* `airoboros-l2-70b-2.1.Q6_K.gguf-split-a`
|
130 |
-
* `airoboros-l2-70b-2.1.Q6_K.gguf-split-b`
|
131 |
-
|
132 |
-
### q8_0
|
133 |
-
Please download:
|
134 |
-
* `airoboros-l2-70b-2.1.Q8_0.gguf-split-a`
|
135 |
-
* `airoboros-l2-70b-2.1.Q8_0.gguf-split-b`
|
136 |
-
|
137 |
-
To join the files, do the following:
|
138 |
-
|
139 |
-
Linux and macOS:
|
140 |
-
```
|
141 |
-
cat airoboros-l2-70b-2.1.Q6_K.gguf-split-* > airoboros-l2-70b-2.1.Q6_K.gguf && rm airoboros-l2-70b-2.1.Q6_K.gguf-split-*
|
142 |
-
cat airoboros-l2-70b-2.1.Q8_0.gguf-split-* > airoboros-l2-70b-2.1.Q8_0.gguf && rm airoboros-l2-70b-2.1.Q8_0.gguf-split-*
|
143 |
-
```
|
144 |
-
Windows command line:
|
145 |
-
```
|
146 |
-
COPY /B airoboros-l2-70b-2.1.Q6_K.gguf-split-a + airoboros-l2-70b-2.1.Q6_K.gguf-split-b airoboros-l2-70b-2.1.Q6_K.gguf
|
147 |
-
del airoboros-l2-70b-2.1.Q6_K.gguf-split-a airoboros-l2-70b-2.1.Q6_K.gguf-split-b
|
148 |
-
|
149 |
-
COPY /B airoboros-l2-70b-2.1.Q8_0.gguf-split-a + airoboros-l2-70b-2.1.Q8_0.gguf-split-b airoboros-l2-70b-2.1.Q8_0.gguf
|
150 |
-
del airoboros-l2-70b-2.1.Q8_0.gguf-split-a airoboros-l2-70b-2.1.Q8_0.gguf-split-b
|
151 |
-
```
|
152 |
-
|
153 |
-
</details>
|
154 |
-
|
155 |
<!-- README_GGUF.md-provided-files end -->
|
156 |
|
157 |
<!-- README_GGUF.md-how-to-run start -->
|
@@ -217,7 +183,9 @@ And thank you again to a16z for their generous grant.
|
|
217 |
|
218 |
### Overview
|
219 |
|
220 |
-
__*
|
|
|
|
|
221 |
|
222 |
This is an instruction fine-tuned llama-2 model, using synthetic data generated by [airoboros](https://github.com/jondurbin/airoboros)
|
223 |
|
|
|
43 |
The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata. It also includes significantly improved tokenization code, including for the first time full support for special tokens. This should improve performance, especially with models that use new special tokens and implement custom prompt templates.
|
44 |
|
45 |
As of August 25th, here is a list of clients and libraries that are known to support GGUF:
|
46 |
+
* [llama.cpp](https://github.com/ggerganov/llama.cpp).
|
47 |
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI. Supports GGUF with GPU acceleration via the ctransformers backend - llama-cpp-python backend should work soon too.
|
48 |
* [KoboldCpp](https://github.com/LostRuins/koboldcpp), now supports GGUF as of release 1.41! A powerful GGML web UI, with full GPU accel. Especially good for story telling.
|
49 |
+
* [LM Studio](https://lmstudio.ai/), version 0.2.2 and later support GGUF. A fully featured local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS.
|
50 |
* [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), should now work, choose the `c_transformers` backend. A great web UI with many interesting features. Supports CUDA GPU acceleration.
|
51 |
* [ctransformers](https://github.com/marella/ctransformers), now supports GGUF as of version 0.2.24! A Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
|
52 |
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), supports GGUF as of version 0.1.79. A Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
|
53 |
* [candle](https://github.com/huggingface/candle), added GGUF support on August 22nd. Candle is a Rust ML framework with a focus on performance, including GPU support, and ease of use.
|
54 |
|
55 |
The clients and libraries below are expecting to add GGUF support shortly:
|
|
|
56 |
<!-- README_GGUF.md-about-gguf end -->
|
57 |
|
58 |
<!-- repositories-available start -->
|
|
|
104 |
|
105 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
106 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
107 |
+
| [airoboros-l2-70b-2.1.Q6_K.gguf-split-b](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q6_K.gguf-split-b) | Q6_K | 6 | 19.89 GB| 22.39 GB | very large, extremely low quality loss |
|
108 |
| [airoboros-l2-70b-2.1.Q2_K.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q2_K.gguf) | Q2_K | 2 | 29.28 GB| 31.78 GB | smallest, significant quality loss - not recommended for most purposes |
|
109 |
| [airoboros-l2-70b-2.1.Q3_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_S.gguf) | Q3_K_S | 3 | 29.92 GB| 32.42 GB | very small, high quality loss |
|
110 |
| [airoboros-l2-70b-2.1.Q3_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_M.gguf) | Q3_K_M | 3 | 33.19 GB| 35.69 GB | very small, high quality loss |
|
111 |
| [airoboros-l2-70b-2.1.Q3_K_L.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_L.gguf) | Q3_K_L | 3 | 36.15 GB| 38.65 GB | small, substantial quality loss |
|
112 |
+
| [airoboros-l2-70b-2.1.Q8_0.gguf-split-b](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q8_0.gguf-split-b) | Q8_0 | 8 | 36.53 GB| 39.03 GB | very large, extremely low quality loss - not recommended |
|
113 |
+
| [airoboros-l2-70b-2.1.Q6_K.gguf-split-a](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q6_K.gguf-split-a) | Q6_K | 6 | 36.70 GB| 39.20 GB | very large, extremely low quality loss |
|
114 |
+
| [airoboros-l2-70b-2.1.Q8_0.gguf-split-a](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q8_0.gguf-split-a) | Q8_0 | 8 | 36.70 GB| 39.20 GB | very large, extremely low quality loss - not recommended |
|
115 |
| [airoboros-l2-70b-2.1.Q4_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q4_K_S.gguf) | Q4_K_S | 4 | 39.07 GB| 41.57 GB | small, greater quality loss |
|
116 |
| [airoboros-l2-70b-2.1.Q4_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q4_K_M.gguf) | Q4_K_M | 4 | 41.42 GB| 43.92 GB | medium, balanced quality - recommended |
|
117 |
| [airoboros-l2-70b-2.1.Q5_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q5_K_S.gguf) | Q5_K_S | 5 | 47.46 GB| 49.96 GB | large, low quality loss - recommended |
|
118 |
| [airoboros-l2-70b-2.1.Q5_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q5_K_M.gguf) | Q5_K_M | 5 | 48.75 GB| 51.25 GB | large, very low quality loss - recommended |
|
|
|
|
|
119 |
|
120 |
**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
121 |
<!-- README_GGUF.md-provided-files end -->
|
122 |
|
123 |
<!-- README_GGUF.md-how-to-run start -->
|
|
|
183 |
|
184 |
### Overview
|
185 |
|
186 |
+
__*NOTE: The weights have been re-uploaded as of 2023-08-28 06:57PM EST*__
|
187 |
+
|
188 |
+
__*I re-merged the adapter weights (info here: https://twitter.com/jon_durbin/status/1696243076178571474)*__
|
189 |
|
190 |
This is an instruction fine-tuned llama-2 model, using synthetic data generated by [airoboros](https://github.com/jondurbin/airoboros)
|
191 |
|