https://huggingface.co/leafspark/IridiumLlama-72B-v0.1

#218
by leafspark - opened

It's a merge between Qwen2 72b Instruct, magnum 72b and calme2.1 72b, converted to llama. Quant please thanks! Also quick question: do you requant from q8_0 after safetensors conversion or fp/bf16?

It's queued. You can watch its progress at http://hf.tst.eu/status.html, if you can guess how to interpret that.

I always quantize from the source precision (defined by whatever llama.cpp thinks this is, usually f32, f16 or bf16, depending on the tensor and version of llama.cpp). I don't think anybody would first quant to Q8_0 and then further, as this doesn't seem to offer any advantage (in fact would probably be slower), but some people do first quantize to f16 or bf16 (in my experience, usually because they mistakenly think they have to).

mradermacher changed discussion status to closed

The model lacks the config.json file.

mradermacher changed discussion status to open

Restarted it, cheers!

mradermacher changed discussion status to closed

Out of curiosity, why did you make a llama conversion and not quant the original model?

I uploaded it first (the conversion process resharded it into 31 safetensors from 936 individual tensor files, which was another reason) and decided to just use this one since the Llama architecture is fairly similar to Qwen2 afaik (at the cost of the context length, 128k -> 32k). However it may be better supported in some tools too, for example exllamav2.

On that note I just finished the Qwen2 model upload (leafspark/Iridium-72B-v0.1), I noticed your queue was large so I held off on requesting quants.

Ah, I see - well, at least llama.cpp should handle qwen2 directly without losing context. And should in theory be able to handle the safetensor file, uh, mess :) Don't worry about the queue length, the queue is there so I can make better scheduling decisions. I'll try to quant Iridium-72B-v0.1 and see what happens.

Sign up or log in to comment