Text Generation
Transformers
Safetensors
English
llama
nvidia
llama3.1
conversational
text-generation-inference

Will quantised version be available?

#9
by angerhang - opened

Thanks for sharing but what are the recommended ways to quantise this model?
Or will quantised model be made available so that it is not as resource-intensive to do inference?

Thanks

Did you see https://huggingface.co/models?other=base_model:quantized:nvidia/Llama-3.1-Nemotron-70B-Instruct-HF?
Use the model tree section on model pages to see what quantizations are available.

NVIDIA hasn't released any quantized version yet. But there are several community quantization efforts mentioned above.

Sign up or log in to comment