Transformers
GGUF
llama
TheBloke commited on
Commit
88147a8
·
1 Parent(s): c1e3316

Initial GGUF model commit

Browse files
Files changed (1) hide show
  1. README.md +9 -41
README.md CHANGED
@@ -43,16 +43,16 @@ GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is
43
  The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata. It also includes significantly improved tokenization code, including for the first time full support for special tokens. This should improve performance, especially with models that use new special tokens and implement custom prompt templates.
44
 
45
  As of August 25th, here is a list of clients and libraries that are known to support GGUF:
46
- * [llama.cpp](https://github.com/ggerganov/llama.cpp)
47
  * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI. Supports GGUF with GPU acceleration via the ctransformers backend - llama-cpp-python backend should work soon too.
48
  * [KoboldCpp](https://github.com/LostRuins/koboldcpp), now supports GGUF as of release 1.41! A powerful GGML web UI, with full GPU accel. Especially good for story telling.
 
49
  * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), should now work, choose the `c_transformers` backend. A great web UI with many interesting features. Supports CUDA GPU acceleration.
50
  * [ctransformers](https://github.com/marella/ctransformers), now supports GGUF as of version 0.2.24! A Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
51
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), supports GGUF as of version 0.1.79. A Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
52
  * [candle](https://github.com/huggingface/candle), added GGUF support on August 22nd. Candle is a Rust ML framework with a focus on performance, including GPU support, and ease of use.
53
 
54
  The clients and libraries below are expecting to add GGUF support shortly:
55
- * [LM Studio](https://lmstudio.ai/), should be updated by end August 25th.
56
  <!-- README_GGUF.md-about-gguf end -->
57
 
58
  <!-- repositories-available start -->
@@ -104,54 +104,20 @@ Refer to the Provided Files table below to see what files use which methods, and
104
 
105
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
106
  | ---- | ---- | ---- | ---- | ---- | ----- |
 
107
  | [airoboros-l2-70b-2.1.Q2_K.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q2_K.gguf) | Q2_K | 2 | 29.28 GB| 31.78 GB | smallest, significant quality loss - not recommended for most purposes |
108
  | [airoboros-l2-70b-2.1.Q3_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_S.gguf) | Q3_K_S | 3 | 29.92 GB| 32.42 GB | very small, high quality loss |
109
  | [airoboros-l2-70b-2.1.Q3_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_M.gguf) | Q3_K_M | 3 | 33.19 GB| 35.69 GB | very small, high quality loss |
110
  | [airoboros-l2-70b-2.1.Q3_K_L.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_L.gguf) | Q3_K_L | 3 | 36.15 GB| 38.65 GB | small, substantial quality loss |
 
 
 
111
  | [airoboros-l2-70b-2.1.Q4_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q4_K_S.gguf) | Q4_K_S | 4 | 39.07 GB| 41.57 GB | small, greater quality loss |
112
  | [airoboros-l2-70b-2.1.Q4_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q4_K_M.gguf) | Q4_K_M | 4 | 41.42 GB| 43.92 GB | medium, balanced quality - recommended |
113
  | [airoboros-l2-70b-2.1.Q5_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q5_K_S.gguf) | Q5_K_S | 5 | 47.46 GB| 49.96 GB | large, low quality loss - recommended |
114
  | [airoboros-l2-70b-2.1.Q5_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q5_K_M.gguf) | Q5_K_M | 5 | 48.75 GB| 51.25 GB | large, very low quality loss - recommended |
115
- | airoboros-l2-70b-2.1.Q6_K.gguf | q6_K | 6 | 56.82 GB | 59.32 GB | very large, extremely low quality loss |
116
- | airoboros-l2-70b-2.1.Q8_0.gguf | q8_0 | 8 | 73.29 GB | 75.79 GB | very large, extremely low quality loss - not recommended |
117
 
118
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
119
-
120
- ### Q6_K and Q8_0 files are split and require joining
121
-
122
- **Note:** HF does not support uploading files larger than 50GB. Therefore I have uploaded the Q6_K and Q8_0 files as split files.
123
-
124
- <details>
125
- <summary>Click for instructions regarding Q6_K and Q8_0 files</summary>
126
-
127
- ### q6_K
128
- Please download:
129
- * `airoboros-l2-70b-2.1.Q6_K.gguf-split-a`
130
- * `airoboros-l2-70b-2.1.Q6_K.gguf-split-b`
131
-
132
- ### q8_0
133
- Please download:
134
- * `airoboros-l2-70b-2.1.Q8_0.gguf-split-a`
135
- * `airoboros-l2-70b-2.1.Q8_0.gguf-split-b`
136
-
137
- To join the files, do the following:
138
-
139
- Linux and macOS:
140
- ```
141
- cat airoboros-l2-70b-2.1.Q6_K.gguf-split-* > airoboros-l2-70b-2.1.Q6_K.gguf && rm airoboros-l2-70b-2.1.Q6_K.gguf-split-*
142
- cat airoboros-l2-70b-2.1.Q8_0.gguf-split-* > airoboros-l2-70b-2.1.Q8_0.gguf && rm airoboros-l2-70b-2.1.Q8_0.gguf-split-*
143
- ```
144
- Windows command line:
145
- ```
146
- COPY /B airoboros-l2-70b-2.1.Q6_K.gguf-split-a + airoboros-l2-70b-2.1.Q6_K.gguf-split-b airoboros-l2-70b-2.1.Q6_K.gguf
147
- del airoboros-l2-70b-2.1.Q6_K.gguf-split-a airoboros-l2-70b-2.1.Q6_K.gguf-split-b
148
-
149
- COPY /B airoboros-l2-70b-2.1.Q8_0.gguf-split-a + airoboros-l2-70b-2.1.Q8_0.gguf-split-b airoboros-l2-70b-2.1.Q8_0.gguf
150
- del airoboros-l2-70b-2.1.Q8_0.gguf-split-a airoboros-l2-70b-2.1.Q8_0.gguf-split-b
151
- ```
152
-
153
- </details>
154
-
155
  <!-- README_GGUF.md-provided-files end -->
156
 
157
  <!-- README_GGUF.md-how-to-run start -->
@@ -217,7 +183,9 @@ And thank you again to a16z for their generous grant.
217
 
218
  ### Overview
219
 
220
- __*I haven't tested this at all yet, quality could be great or absolute trash, I really don't know, but feel free to try.*__
 
 
221
 
222
  This is an instruction fine-tuned llama-2 model, using synthetic data generated by [airoboros](https://github.com/jondurbin/airoboros)
223
 
 
43
  The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata. It also includes significantly improved tokenization code, including for the first time full support for special tokens. This should improve performance, especially with models that use new special tokens and implement custom prompt templates.
44
 
45
  As of August 25th, here is a list of clients and libraries that are known to support GGUF:
46
+ * [llama.cpp](https://github.com/ggerganov/llama.cpp).
47
  * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI. Supports GGUF with GPU acceleration via the ctransformers backend - llama-cpp-python backend should work soon too.
48
  * [KoboldCpp](https://github.com/LostRuins/koboldcpp), now supports GGUF as of release 1.41! A powerful GGML web UI, with full GPU accel. Especially good for story telling.
49
+ * [LM Studio](https://lmstudio.ai/), version 0.2.2 and later support GGUF. A fully featured local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS.
50
  * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), should now work, choose the `c_transformers` backend. A great web UI with many interesting features. Supports CUDA GPU acceleration.
51
  * [ctransformers](https://github.com/marella/ctransformers), now supports GGUF as of version 0.2.24! A Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
52
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), supports GGUF as of version 0.1.79. A Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
53
  * [candle](https://github.com/huggingface/candle), added GGUF support on August 22nd. Candle is a Rust ML framework with a focus on performance, including GPU support, and ease of use.
54
 
55
  The clients and libraries below are expecting to add GGUF support shortly:
 
56
  <!-- README_GGUF.md-about-gguf end -->
57
 
58
  <!-- repositories-available start -->
 
104
 
105
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
106
  | ---- | ---- | ---- | ---- | ---- | ----- |
107
+ | [airoboros-l2-70b-2.1.Q6_K.gguf-split-b](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q6_K.gguf-split-b) | Q6_K | 6 | 19.89 GB| 22.39 GB | very large, extremely low quality loss |
108
  | [airoboros-l2-70b-2.1.Q2_K.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q2_K.gguf) | Q2_K | 2 | 29.28 GB| 31.78 GB | smallest, significant quality loss - not recommended for most purposes |
109
  | [airoboros-l2-70b-2.1.Q3_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_S.gguf) | Q3_K_S | 3 | 29.92 GB| 32.42 GB | very small, high quality loss |
110
  | [airoboros-l2-70b-2.1.Q3_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_M.gguf) | Q3_K_M | 3 | 33.19 GB| 35.69 GB | very small, high quality loss |
111
  | [airoboros-l2-70b-2.1.Q3_K_L.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q3_K_L.gguf) | Q3_K_L | 3 | 36.15 GB| 38.65 GB | small, substantial quality loss |
112
+ | [airoboros-l2-70b-2.1.Q8_0.gguf-split-b](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q8_0.gguf-split-b) | Q8_0 | 8 | 36.53 GB| 39.03 GB | very large, extremely low quality loss - not recommended |
113
+ | [airoboros-l2-70b-2.1.Q6_K.gguf-split-a](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q6_K.gguf-split-a) | Q6_K | 6 | 36.70 GB| 39.20 GB | very large, extremely low quality loss |
114
+ | [airoboros-l2-70b-2.1.Q8_0.gguf-split-a](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q8_0.gguf-split-a) | Q8_0 | 8 | 36.70 GB| 39.20 GB | very large, extremely low quality loss - not recommended |
115
  | [airoboros-l2-70b-2.1.Q4_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q4_K_S.gguf) | Q4_K_S | 4 | 39.07 GB| 41.57 GB | small, greater quality loss |
116
  | [airoboros-l2-70b-2.1.Q4_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q4_K_M.gguf) | Q4_K_M | 4 | 41.42 GB| 43.92 GB | medium, balanced quality - recommended |
117
  | [airoboros-l2-70b-2.1.Q5_K_S.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q5_K_S.gguf) | Q5_K_S | 5 | 47.46 GB| 49.96 GB | large, low quality loss - recommended |
118
  | [airoboros-l2-70b-2.1.Q5_K_M.gguf](https://huggingface.co/TheBloke/Airoboros-L2-70B-2.1-GGUF/blob/main/airoboros-l2-70b-2.1.Q5_K_M.gguf) | Q5_K_M | 5 | 48.75 GB| 51.25 GB | large, very low quality loss - recommended |
 
 
119
 
120
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  <!-- README_GGUF.md-provided-files end -->
122
 
123
  <!-- README_GGUF.md-how-to-run start -->
 
183
 
184
  ### Overview
185
 
186
+ __*NOTE: The weights have been re-uploaded as of 2023-08-28 06:57PM EST*__
187
+
188
+ __*I re-merged the adapter weights (info here: https://twitter.com/jon_durbin/status/1696243076178571474)*__
189
 
190
  This is an instruction fine-tuned llama-2 model, using synthetic data generated by [airoboros](https://github.com/jondurbin/airoboros)
191