Spaces:

ggml-org
/

gguf-my-repo

Running on A10G

Add F16 and BF16 quantization

#129

by andito HF Staff - opened Oct 30, 2024

←

andito

Oct 30, 2024

No description provided.

ngxson

ggml.ai org Nov 2, 2024

The problem with adding BF16 is that current we use convert_hf_to_gguf.py to convert HF model into F16, then use llama-quantize to quantize it.

So the conversion will be safetensors --> F16 --> BF16 which adds no benefit to the output model.

What we should do here is also modify the code that run convert_hf_to_gguf.py, so it outputs directly BF16 GGUF file

19 days ago

I would also like to see a direct conversion into 'Floating Point 16', especially now that I know how 'BrainFloat16' works.

TL;DR Basically, BF16 sucks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Cannot merge

This branch has merge conflicts in the following files:

· Sign up or log in to comment