DavidAU commited on
Commit
ab1aeac
1 Parent(s): a480cd6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -0
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - creative
7
+ - story
8
+ - writing
9
+ - fiction
10
+ - float32
11
+ - roleplaying
12
+ - rp
13
+ - enhanced
14
+ - space whale
15
+ - 32 bit upscale
16
+ ---
17
+ <font color=red><h3> Ultra High Quality + NEO , NEO "X" Quants Remaster of the incredible: Psyonic-Cetacean-20B. </h3></font>
18
+
19
+ This is a Floating Point 32 upscale, where all components and merges were remastered to floating point 32.
20
+ This includes all the merges (recreated with master files), and where possible subbing full FP32 models.
21
+
22
+ This repo contains 3 quants of the model all at IQ4XS (links to full quants Ultra, and Ultra Imatrix below).
23
+
24
+ <img src="space-whale-thinking.jpg">
25
+
26
+ The goal: Carry forward maximum precision right up to the point where it is "GUFFed".
27
+
28
+ This includes F32 master file for GGUF too... at a whopping 78 GBs. (compare at 38 GBs average for 20B models)
29
+
30
+ WHY?
31
+
32
+ Because the difference between F32 vs BF16 is... over 8 DECIMAL places.
33
+
34
+ And as each merge / model is modified there are "losses" along the way.
35
+
36
+ These losses are carried forward and in turn lead to more losses.
37
+
38
+ And decimal points are critical to model performance.
39
+
40
+ SMALL?
41
+
42
+ Yes... but multipled by each merge(s), and compression(s): 20 billion times.
43
+
44
+ <B>The result of Ultra Quality:</b>
45
+
46
+ At Q2K an impressive drop of 533 points in perplexity. (lower is better)
47
+ (VS: Q2K original base model: PPL = 9.8077 +/- 0.06821 )
48
+
49
+ At Q4KM a whopping drop of 976 points in perplexity.
50
+ (VS: Q4km original base model -> PPL = 8.7858 +/- 0.06074)
51
+
52
+ At Q6 an awesome drop of 234 points in perplexity.
53
+ (VS: Q6 original base model -> PPL = 8.6070 +/- 0.05907 )
54
+
55
+ To put this in perspective "Q6" now operates ABOVE the orginal full precision version of "Psyonic-Cetacean-20b"
56
+ and Q4KM operates at close to Q6 level quality.
57
+
58
+ This because at "Q6" the quant / compressed model is considered to be accurate within "+0.0008 ppl" of the full,
59
+ uncompressed / unquanted model and it exceeds this threshold by over 200 points.
60
+
61
+ But... what about Q8?
62
+
63
+ The mountain moved:
64
+
65
+ 150 points better: PPL = 8.5850 +/- 0.05881 VS: BASE/ORGINAL: PPL = 8.6012 +/- 0.05900
66
+
67
+ <b>NEO Imatrix and NEO X Quants: </b>
68
+
69
+ Quant 1 Neo V1 Imatrix:
70
+
71
+ This is a NEO Imatrix upgrade of the model, with the base model being the "Ultra Quality 20B."
72
+ This upgrade is above Ultra Quality, Imatrix Plus and Imatrix Plus2.
73
+
74
+ Quant 2 Neo V1 Imatrix X Quant "Alpha":
75
+
76
+ This is a NEO Imatrix upgrade of the model, with the base model being the "Ultra Quality 20B.", with specialized "Alpha" X quant settings
77
+ that are specific to Psyonic-Cetacean-20B model structure only.
78
+
79
+ Quant 3 Neo V1 Imatrix X Quant "Beta":
80
+
81
+ This is a NEO Imatrix upgrade of the model, with the base model being the "Ultra Quality 20B.", with specialized "Beta" X quant settings
82
+ that are specific to Psyonic-Cetacean-20B model structure only.
83
+
84
+ Compare the output of each of these new versions below in the "examples" area.
85
+
86
+ <B>THE RESULTS ARE IN: </b>
87
+
88
+ AS per Jeb Carter, original creator of the model:
89
+
90
+ - instruction following has improved dramatically.
91
+ - new abilities have emerged.
92
+ - he had to REDUCE the instructions sets used because the model no longer needed as specific instructions.
93
+ - prose, nuance and depth have all improved.
94
+ - known issues with the original model have disappeared.
95
+
96
+ This is not "something for nothing" ; it is method of ensuring maximum precision at every step just before "ggufing" the model.
97
+
98
+ The methods employed only ensure precision loss is minimized or eliminated.
99
+
100
+ It is mathematical and theory sound.
101
+
102
+ <B>The bottom line here is this:</b>
103
+
104
+ Higher quality instruction following and output.
105
+
106
+ Likewise you can use a smaller compression, with higher token per second and still get great quality.
107
+
108
+ Same great model... turbo charged.
109
+
110
+ This is the first group of remasters.
111
+
112
+ <B>The FOUR Horsemen:</B>
113
+
114
+ This repo will be followed by a "reg quant plus" repo, which added additional components into the GGUF (all levels) at floating point 32
115
+ precision to further increase the sheer creativity and raw AI horsepower.
116
+
117
+ This process shaves at extra 50-100 points off perplexity... again.
118
+
119
+ Following this group will be a full float 32 precision Imatrix (including reg quants "imatrixed").
120
+
121
+ Test results VS org and "ultra" regular quants will be posted when they come in.
122
+
123
+ Ultra Quality Non-Imatrix Quants:
124
+
125
+ [ https://huggingface.co/DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF ]
126
+
127
+ Imatrix Repo:
128
+ [ https://huggingface.co/DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF-imatrix ]
129
+
130
+ Imatrix Plus 2 Repo:
131
+ [ https://huggingface.co/DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF-imat-plus2 ]
132
+
133
+ Details of all methods (and pitfalls to avoid) employed to make this high precision remasters will be
134
+ posted shortly along with comparision of original model and new ultra remaster.
135
+
136
+ Thanks again to Jeb Carter, the original creator of "Psyonic-Cetacean 20B"
137
+
138
+ [ https://huggingface.co/jebcarter/psyonic-cetacean-20B ]
139
+
140
+ <h3> Examples of NEO , NEO "X" Quants: </h3>