bartowski commited on
Commit
446aa01
1 Parent(s): aab43ae

remove arm quants for now

Browse files
README.md CHANGED
@@ -52,9 +52,6 @@ You are a world-class AI system, capable of complex reasoning and reflection. Re
52
  | [Reflection-Llama-3.1-70B-Q4_K_M.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_M.gguf) | Q4_K_M | 42.52GB | false | Good quality, default size for must use cases, *recommended*. |
53
  | [Reflection-Llama-3.1-70B-Q4_K_S.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_S.gguf) | Q4_K_S | 40.35GB | false | Slightly lower quality with more space savings, *recommended*. |
54
  | [Reflection-Llama-3.1-70B-Q4_0.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0.gguf) | Q4_0 | 40.12GB | false | Legacy format, generally not worth using over similarly sized formats |
55
- | [Reflection-Llama-3.1-70B-Q4_0_8_8.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0_8_8.gguf) | Q4_0_8_8 | 39.97GB | false | Optimized for ARM inference. Requires 'sve' support (see link below). |
56
- | [Reflection-Llama-3.1-70B-Q4_0_4_8.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0_4_8.gguf) | Q4_0_4_8 | 39.97GB | false | Optimized for ARM inference. Requires 'i8mm' support (see link below). |
57
- | [Reflection-Llama-3.1-70B-Q4_0_4_4.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0_4_4.gguf) | Q4_0_4_4 | 39.97GB | false | Optimized for ARM inference. Should work well on all ARM chips, pick this if you're unsure. |
58
  | [Reflection-Llama-3.1-70B-Q3_K_XL.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_XL.gguf) | Q3_K_XL | 38.06GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
59
  | [Reflection-Llama-3.1-70B-IQ4_XS.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-IQ4_XS.gguf) | IQ4_XS | 37.90GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
60
  | [Reflection-Llama-3.1-70B-Q3_K_L.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_L.gguf) | Q3_K_L | 37.14GB | false | Lower quality but usable, good for low RAM availability. |
@@ -97,12 +94,6 @@ huggingface-cli download bartowski/Reflection-Llama-3.1-70B-GGUF --include "Refl
97
 
98
  You can either specify a new local-dir (Reflection-Llama-3.1-70B-Q8_0) or download them all in place (./)
99
 
100
- ## Q4_0_X_X
101
-
102
- If you're using an ARM chip, the Q4_0_X_X quants will have a substantial speedup. Check out Q4_0_4_4 speed comparisons [on the original pull request](https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660)
103
-
104
- To check which one would work best for your ARM chip, you can check [AArch64 SoC features](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) (thanks EloyOn!).
105
-
106
  ## Which file should I choose?
107
 
108
  A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
 
52
  | [Reflection-Llama-3.1-70B-Q4_K_M.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_M.gguf) | Q4_K_M | 42.52GB | false | Good quality, default size for must use cases, *recommended*. |
53
  | [Reflection-Llama-3.1-70B-Q4_K_S.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_K_S.gguf) | Q4_K_S | 40.35GB | false | Slightly lower quality with more space savings, *recommended*. |
54
  | [Reflection-Llama-3.1-70B-Q4_0.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q4_0.gguf) | Q4_0 | 40.12GB | false | Legacy format, generally not worth using over similarly sized formats |
 
 
 
55
  | [Reflection-Llama-3.1-70B-Q3_K_XL.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_XL.gguf) | Q3_K_XL | 38.06GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
56
  | [Reflection-Llama-3.1-70B-IQ4_XS.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-IQ4_XS.gguf) | IQ4_XS | 37.90GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
57
  | [Reflection-Llama-3.1-70B-Q3_K_L.gguf](https://huggingface.co/bartowski/Reflection-Llama-3.1-70B-GGUF/blob/main/Reflection-Llama-3.1-70B-Q3_K_L.gguf) | Q3_K_L | 37.14GB | false | Lower quality but usable, good for low RAM availability. |
 
94
 
95
  You can either specify a new local-dir (Reflection-Llama-3.1-70B-Q8_0) or download them all in place (./)
96
 
 
 
 
 
 
 
97
  ## Which file should I choose?
98
 
99
  A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
Reflection-Llama-3.1-70B-Q4_0_4_4.gguf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:11c35f5544ae4448f320537de4798a3933358cc068debc36739ad2be9cab951f
3
- size 39969801152
 
 
 
 
Reflection-Llama-3.1-70B-Q4_0_4_8.gguf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d8e8bebaa384f36ee9f09b1b35dd8b76271052b15184e11fab7e4ef187ed51c9
3
- size 39969801152
 
 
 
 
Reflection-Llama-3.1-70B-Q4_0_8_8.gguf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:ac6852dbd35611e390d781569ac3a1bdebbb1cf83be970f11717af099964bf36
3
- size 39969801152