nisten commited on
Commit
4072371
1 Parent(s): 6329a82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -9,10 +9,15 @@ This repository contains CPU-optimized GGUF quantizations of the Meta-Llama-3.1-
9
 
10
  ## Available Quantizations
11
 
 
 
12
  1. Q4_0_4_8 (CPU FMA-Optimized): ~246 GB
13
- 2. BF16: ~811 GB
14
- 3. Q8_0: ~406 GB
15
- 4. Q2-Q8 (custom quant I wrote) ~ 165 GB
 
 
 
16
 
17
  ## Use Aria2 for parallelized downloads, links will download 9x faster
18
 
@@ -22,8 +27,7 @@ This repository contains CPU-optimized GGUF quantizations of the Meta-Llama-3.1-
22
  >>
23
  >>Feel free to paste these all in at once or one at a time
24
 
25
- ### Q4_0_48 (CPU Optimized) Example response of 20000 token prompt:
26
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/DD71wAB7DlQBmTG8wVaWS.png)
27
 
28
 
29
  ```bash
@@ -36,7 +40,7 @@ aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00006-of-00006.gg
36
  ```
37
 
38
 
39
- ### IQ4_XS Version - Fastest for CPU/GPU (Size: ~212 GB)
40
  ```bash
41
  aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00001-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00001-of-00005.gguf
42
  aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00002-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00002-of-00005.gguf
@@ -52,7 +56,7 @@ aria2c -x 16 -s 16 -k 1M -o meta-405b-1bit-00002-of-00003.gguf https://huggingfa
52
  aria2c -x 16 -s 16 -k 1M -o meta-405b-1bit-00003-of-00003.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-1bit-00003-of-00003.gguf
53
  ```
54
 
55
- Note: Sizes are approximate and converted to GB (1 GB = 1024 MiB).
56
  ### Q2K-Q8 Mixed 2bit 8bit I wrote myself. This is the smallest coherent one I could make WITHOUT imatrix
57
 
58
  ```verilog
@@ -70,6 +74,11 @@ aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-imatrix-2k-00003-of-00004.gguf https:/
70
  aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-imatrix-2k-00004-of-00004.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-imatrix-2k-00004-of-00004.gguf
71
  ```
72
 
 
 
 
 
 
73
  ### BF16 Version
74
 
75
  ```bash
 
9
 
10
  ## Available Quantizations
11
 
12
+ Available Quantizations
13
+
14
  1. Q4_0_4_8 (CPU FMA-Optimized): ~246 GB
15
+ 2. IQ4_XS (Fastest for CPU/GPU): ~212 GB
16
+ 3. Q2K-Q8 Mixed quant with iMatrix: ~154 GB
17
+ 4. Q2K-Q8 Mixed without iMat for testing: ~165 GB
18
+ 5. 1-bit Custom per weight COHERENT quant: ~103 GB
19
+ 6. BF16: ~811 GB (original model)
20
+ 7. Q8_0: ~406 GB (original model)
21
 
22
  ## Use Aria2 for parallelized downloads, links will download 9x faster
23
 
 
27
  >>
28
  >>Feel free to paste these all in at once or one at a time
29
 
30
+ ### Q4_0_48 (CPU FMA Optimized Specifically for ARM server chips, NOT TESTED on X86)
 
31
 
32
 
33
  ```bash
 
40
  ```
41
 
42
 
43
+ ### IQ4_XS Version - Fastest for CPU/GPU should work everywhere (Size: ~212 GB)
44
  ```bash
45
  aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00001-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00001-of-00005.gguf
46
  aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00002-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00002-of-00005.gguf
 
56
  aria2c -x 16 -s 16 -k 1M -o meta-405b-1bit-00003-of-00003.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-1bit-00003-of-00003.gguf
57
  ```
58
 
59
+
60
  ### Q2K-Q8 Mixed 2bit 8bit I wrote myself. This is the smallest coherent one I could make WITHOUT imatrix
61
 
62
  ```verilog
 
74
  aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-imatrix-2k-00004-of-00004.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-imatrix-2k-00004-of-00004.gguf
75
  ```
76
 
77
+ <figure>
78
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/DD71wAB7DlQBmTG8wVaWS.png" alt="Q4_0_48 CPU Optimized example response">
79
+ <figcaption><strong>Q4_0_48 (CPU Optimized) (246GB):</strong> Example response of 20000 token prompt</figcaption>
80
+ </figure>
81
+
82
  ### BF16 Version
83
 
84
  ```bash