Nemotron models that have been converted and/or quantized to work well in vLLM
Michael Goin
mgoin
AI & ML interests
LLM inference optimization, compression, quantization, pruning, distillation
Recent Activity
new activity
about 20 hours ago
mistralai/Mistral-Small-3.1-24B-Instruct-2503:FP8 Dynamic/W8A16 Quants Please
new activity
about 20 hours ago
mistralai/Mistral-Small-3.1-24B-Instruct-2503:Problem hosting the model using vllm
updated
a model
8 days ago
nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic
Organizations
Collections
1
spaces
4
models
95

mgoin/Qwen1.5-14B-Chat-GPTQ
Text Generation
•
Updated
•
7

mgoin/pixtral-12b
Image-Text-to-Text
•
Updated
•
19
•
1

mgoin/Llama-3.2-1B-Instruct-FP8-ATTN
Updated
•
7

mgoin/Llama-3.2-1B-Instruct-FP8-dynamic-ATTN
Updated
•
7

mgoin/Pixtral-Large-Instruct-2411
Updated

mgoin/Qwen2.5-Coder-32B-Instruct-fp8
Updated

mgoin/nemotron-3-8b-chat-4k-sft-hf
Text Generation
•
Updated
•
9

mgoin/llava-onevision-qwen2-7b-ov-hf-bnb-full-4bit
Image-Text-to-Text
•
Updated
•
14

mgoin/MiniCPM-Llama3-V-2_5-int4
Visual Question Answering
•
Updated
•
9

mgoin/DeepSeek-Coder-V2-Lite-Instruct-FP8
Updated
•
13