F16 versions of some models

#54
by julia62729 - opened

First I wanted to thank you for your tireless work! It's much appreciated. I was hoping to convince you to make F16 versions at least of some of the main models (llama-3, mistral, mixtral, phi3) available. Any chance?

I currently make f16 quants when the Q8_0 quant is <= 10GB (as heuristic), so phi3 models should have one.

As fore trhe larger ones, I try to find a balance between blasting huggingface with endless extra terabytes nobody is using and providing actually useful quants. I.e. I am a bit reluctant to just indisciminately uploading big quants. The same can be said for the source/unquantized ggufs.

Consider your comment as noted - I am not set in stone, but at the moment, I only do f16's for 10B and smaller. I can add quants on request, but the problem then is that I have to download the source model again, which seems, again, wasteful of huggingfaces resources as they probably pay for that.

Sigh.

mradermacher changed discussion status to closed

Sign up or log in to comment