-
inference-optimization/test_tencentbac_fastmtp
Updated • 43 -
inference-optimization/test_qwen3_next_mtp
Updated • 46 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct_mtp_speculator
Text Generation • 2B • Updated • 57 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 18
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
-
inference-optimization/test_tencentbac_fastmtp
Updated • 43 -
inference-optimization/test_qwen3_next_mtp
Updated • 46 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct_mtp_speculator
Text Generation • 2B • Updated • 57 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 18
FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
models 183
inference-optimization/Qwen3-4B-Instruct-2507.w8a8
Text Generation • 4B • Updated • 22
inference-optimization/Qwen3-4B-Thinking-2507.w8a8
Text Generation • 4B • Updated • 48
inference-optimization/gpt-oss-20b-from-gpt-oss-120b-ckpt3-speculator.eagle3
0.9B • Updated • 10
inference-optimization/Mistral-Small-4-119B-2603-BF16
119B • Updated • 141
inference-optimization/gpt-oss-20b-from-gpt-oss-120b-ckpt2-speculator.eagle3
0.9B • Updated • 51
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 18
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch2
2B • Updated • 9
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch1
2B • Updated • 12
inference-optimization/gpt-oss-20b-from-gpt-oss-120b-ckpt1-speculator.eagle3
0.9B • Updated • 37
inference-optimization/gpt-oss-20b-from-gpt-oss-120b-ckpt0-speculator.eagle3
0.9B • Updated • 43