Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
appoose 
posted an update Jul 31, 2024
Post
1790
Excited to announce the release of our high-quality Llama-3.1 8B 4-bit HQQ/calibrated quantized model! Achieving an impressive 99.3% relative performance to FP16, it also delivers the fastest inference speed for transformers.

mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib

How's the speed compared to EXL2 quant at the same bits per weight?