mlx-community
/

Llama-3.1-Nemotron-70B-Instruct-HF-8bit

Text Generation

text-generation-inference

8-bit precision

Model card Files Files and versions Community

Resources

View closed (2)

getting very low tokens per second (under 1 t/s) on M2 Ultra 192GB.

#6 opened about 1 month ago by

vLLM: Unknwon quantization method

#5 opened about 2 months ago by

Update README.md

#4 opened 2 months ago by

Upload folder using huggingface_hub

#1 opened 2 months ago by