remove arm quants for now

Browse files

Files changed (4) hide show

README.md +0 -9
Reflection-Llama-3.1-70B-Q4_0_4_4.gguf +0 -3
Reflection-Llama-3.1-70B-Q4_0_4_8.gguf +0 -3
Reflection-Llama-3.1-70B-Q4_0_8_8.gguf +0 -3

README.md CHANGED Viewed

@@ -52,9 +52,6 @@ You are a world-class AI system, capable of complex reasoning and reflection. Re
 | [Reflection-Llama-3.1-70B-Q4_K_M.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_M.gguf) | Q4_K_M | 42.52GB | false | Good quality, default size for must use cases, *recommended*. |
 | [Reflection-Llama-3.1-70B-Q4_K_S.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_S.gguf) | Q4_K_S | 40.35GB | false | Slightly lower quality with more space savings, *recommended*. |
 | [Reflection-Llama-3.1-70B-Q4_0.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0.gguf) | Q4_0 | 40.12GB | false | Legacy format, generally not worth using over similarly sized formats |
-| [Reflection-Llama-3.1-70B-Q4_0_8_8.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0_8_8.gguf) | Q4_0_8_8 | 39.97GB | false | Optimized for ARM inference. Requires 'sve' support (see link below). |
-| [Reflection-Llama-3.1-70B-Q4_0_4_8.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0_4_8.gguf) | Q4_0_4_8 | 39.97GB | false | Optimized for ARM inference. Requires 'i8mm' support (see link below). |
-| [Reflection-Llama-3.1-70B-Q4_0_4_4.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0_4_4.gguf) | Q4_0_4_4 | 39.97GB | false | Optimized for ARM inference. Should work well on all ARM chips, pick this if you're unsure. |
 | [Reflection-Llama-3.1-70B-Q3_K_XL.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_XL.gguf) | Q3_K_XL | 38.06GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
 | [Reflection-Llama-3.1-70B-IQ4_XS.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-IQ4_XS.gguf) | IQ4_XS | 37.90GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
 | [Reflection-Llama-3.1-70B-Q3_K_L.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_L.gguf) | Q3_K_L | 37.14GB | false | Lower quality but usable, good for low RAM availability. |
@@ -97,12 +94,6 @@ huggingface-cli download bartowski/Reflection-Llama-3.1-70B-GGUF --include "Refl
 You can either specify a new local-dir (Reflection-Llama-3.1-70B-Q8_0) or download them all in place (./)
-## Q4_0_X_X
-If you're using an ARM chip, the Q4_0_X_X quants will have a substantial speedup. Check out Q4_0_4_4 speed comparisons [on the original pull request](https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660)
-To check which one would work best for your ARM chip, you can check [AArch64 SoC features](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) (thanks EloyOn!).
 ## Which file should I choose?
 A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)

 | [Reflection-Llama-3.1-70B-Q4_K_M.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_M.gguf) | Q4_K_M | 42.52GB | false | Good quality, default size for must use cases, *recommended*. |
 | [Reflection-Llama-3.1-70B-Q4_K_S.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_S.gguf) | Q4_K_S | 40.35GB | false | Slightly lower quality with more space savings, *recommended*. |
 | [Reflection-Llama-3.1-70B-Q4_0.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0.gguf) | Q4_0 | 40.12GB | false | Legacy format, generally not worth using over similarly sized formats |
 | [Reflection-Llama-3.1-70B-Q3_K_XL.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_XL.gguf) | Q3_K_XL | 38.06GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
 | [Reflection-Llama-3.1-70B-IQ4_XS.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-IQ4_XS.gguf) | IQ4_XS | 37.90GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
 | [Reflection-Llama-3.1-70B-Q3_K_L.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_L.gguf) | Q3_K_L | 37.14GB | false | Lower quality but usable, good for low RAM availability. |
 You can either specify a new local-dir (Reflection-Llama-3.1-70B-Q8_0) or download them all in place (./)
 ## Which file should I choose?
 A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)

Reflection-Llama-3.1-70B-Q4_0_4_4.gguf DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:11c35f5544ae4448f320537de4798a3933358cc068debc36739ad2be9cab951f
-size 39969801152

Reflection-Llama-3.1-70B-Q4_0_4_8.gguf DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:d8e8bebaa384f36ee9f09b1b35dd8b76271052b15184e11fab7e4ef187ed51c9
-size 39969801152

Reflection-Llama-3.1-70B-Q4_0_8_8.gguf DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:ac6852dbd35611e390d781569ac3a1bdebbb1cf83be970f11717af099964bf36
-size 39969801152