Mukul
mtcl
AI & ML interests
None yet
Recent Activity
new activity about 6 hours ago
nvidia/nemotron-3.5-asr-streaming-0.6b:vllm support ? new activity 1 day ago
cyankiwi/gemma-4-12B-it-qat-AWQ-INT4:Vllm and SgLang command please liked a model 3 days ago
MisoLabs/MisoTTSOrganizations
None yet
vllm support ?
👍 7
1
#6 opened 4 days ago
by
sdd5125
Vllm and SgLang command please
👍 1
1
#1 opened 1 day ago
by
mtcl
nvidia/DeepSeek-V4-flash-NVFP4
6
#1 opened 12 days ago
by
mtcl
Docker Image
7
#1 opened 12 days ago
by
mtcl
Worse than (smaller) MiniMax M2.7??
17
#2 opened about 2 months ago
by deleted
Unable to run on 2x RTX Pro 6000 (DEEP_GEMM problem)
➕ 10
17
#15 opened about 2 months ago
by
stev236
Running on 2 RTX Pro 6000 Blackwell GPUs at ~30 tps (Instructions that worked for me)
👍❤️ 7
10
#17 opened about 1 month ago
by
CarouselAether
2x Nvidia 6000 Pros
3
#2 opened about 1 month ago
by
mtcl
Will it work on 2X6000 Pros
6
#1 opened about 1 month ago
by
mtcl
Can I deploy it with sglang at my 8*4090 ubuntu sever?
9
#1 opened about 1 month ago
by
marshal007
Context Length for 2X6000 Pros (2x96 = 192GB VRAM)
3
#2 opened about 2 months ago
by
mtcl
really awesome speeds! running at 256k context.
🔥 1
5
#11 opened about 2 months ago
by
mtcl
MOE 122b and 397b please!
🚀 24
14
#7 opened about 2 months ago
by
jesleocizi
How to disable thinking?
4
#9 opened about 2 months ago
by
Hansi2024
These are NOT actual AWQ-quantized models.
2
#1 opened about 2 months ago
by
cai-cai
max context
#2 opened about 2 months ago
by
mtcl
No think tags.
10
#4 opened about 2 months ago
by
DrRos
Minimax M2.7 NVFP4
👀🔥 5
4
#4 opened about 2 months ago
by
mtcl
Unable to use full 192k context in SGLang with MiniMax-M2.7-NVFP4 (runtime capped at ~80,964 tokens)
3
#9 opened about 2 months ago
by
mtcl
w1 not matching w3 weight scales
12
#1 opened about 2 months ago
by
dareposte