neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16 Text Generation • Updated 9 days ago • 101k • 23
view article Article Fast Inference on Large Language Models: BLOOMZ on Habana Gaudi2 Accelerator Mar 28, 2023 • 1