I may make a Q6 high and/or a Q8 Hybrid and/or Q8 "HI".
Imatrix does not have any affect on Q8 or BF16 ; unless the other tensors in the model are set at Q6 or lower.
A Q8 "HI" is a special case; where one or more tensors/layers are set at BF16.
I may make a Q6 high and/or a Q8 Hybrid and/or Q8 "HI".
Imatrix does not have any affect on Q8 or BF16 ; unless the other tensors in the model are set at Q6 or lower.
A Q8 "HI" is a special case; where one or more tensors/layers are set at BF16.
Currently working with Qwen 3.5/6 35B-A3B in the lab ; learning the "quirks" ; still a ways to go.
quant arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.480,0.656,0.797,0.608,0.400,0.755,0.665
mxfp4 0.455,0.607,0.851,0.585,0.402,0.744,0.651
Quant Perplexity Peak Memory Tokens/sec
mxfp8 35.937 ± 0.525 14.80 GB 1153
mxfp4 36.746 ± 0.534 11.06 GB 1030quant arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.404,0.489,0.825,0.586,0.392,0.734,0.661
mxfp4 0.414,0.508,0.854,0.562,0.378,0.717,0.645
Quant Perplexity Peak Memory Tokens/sec
mxfp8 34.652 ± 0.502 14.80 GB 1146
mxfp4 35.203 ± 0.506 11.06 GB 1200quant arc arc/e boolq hswag obkqa piqa wino
New template
mxfp8 0.518,0.709,0.755,0.657,0.418,0.759,0.626
mxfp4 0.485,0.682,0.792,0.641,0.432,0.746,0.635
Old template
mxfp8 0.506,0.697,0.754,0.661,0.416,0.757,0.627
mxfp4 0.487,0.670,0.792,0.644,0.430,0.748,0.624mxfp8 0.461,0.599,0.779,0.630,0.406,0.766,0.629
Old template
mxfp8 0.456,0.580,0.786,0.629,0.410,0.764,0.633mxfp8 0.509,0.705,0.806,0.646,0.416,0.773,0.650
Old template
mxfp8 0.502,0.692,0.809,0.650,0.420,0.771,0.651RE: 16-18 B ; yes, something running in the lab right now. (Gemma 4).
Also can make Qwen 3's (Version 3) moes like Llama3.2-8X3B as well ; I have some of these at my repo too.
I have built a few GPT-OSS ; and some 12B [mistral nemo] as well as mistral nemo "large" 15-17Bs...
A lot of options ;
Maybe in the future ; atm still learning/addressing quirks with these new Gemmas.
Google released three different arch structure here : "E", "MOE", and 31B dense.
Also plans to create larger Gemma 4s too ; which may work better for specific applications and/or work better period.
These are in the plans for next week.
Exceeds Gemma4 26B-A4B in critical benchmarks.
Training a Gemma 4 Reap 19B-A4B right now ; should be done tomorrow, then testing.
RE: FRanken merge 26B-A3B ; yes, just need to make a map for Mergekit ; this is also in progress.
RE: Claudes ; depends on how reap turns out.
There are a lot of updates still in progress with Unsloth/Llamacpp RE: Gemma 4s atm too ;
There are also some dataset issues to address when training with Gemma 4s.
NOTE:
Just finished a number of fine tunes on Gemma 4's E4B ; which is a MOE LIKE model. These will release in the next day or so ; pending final testing.
UPDATE:
All of these are now up; and can be downloaded.
Awaiting quants.
RE: 13B:
=> one is upscaled + trained, the other is merge of two 9Bs fine tunes (and upscaled).
They are hidden as of this writing (undergoing private testing), awaiting final metrics / eval.
If they "pass" ; they will be made public.
These will be active within 24-48 hrs pending results.
Currently have full running 13B (GLM 4.7 Flash) - which is very strong ; and experimental 21Bs of Qwen 3.5.
These are trained.
These are in testing, and access is limited as of this writing.
As for MOEs:
This is a little more complicated as scripting must be written for Mergekit to "moe together" 0.8B, 2B, 4B, 9Bs etc etc.
A draft (by me) has been completed to do this; but not tested/debugged yet.
No time line here ; too many variables.
RE 35B moes ; it is possible to address this in a different way ; but I have not tried it yet.
This is a different approach than REAP.
9 Heretic Uncensored LFM fine tunes are now up at my repo:
https://huggingface.co/DavidAU/models?sort=created&search=lfm
Model card updates in progress as I write this.
The merges will take a wee bit longer.
...and 5 more new "non-heretic" ones too.