Block Diffusion for Flash Speculative Decoding
AI & ML interests
Efficient AI
Recent Activity
View all activity
Papers
DFlash: Block Diffusion for Flash Speculative Decoding
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
-
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Paper • 2511.10645 • Published • 7 -
z-lab/Qwen3.5-27B-PARO
Image-Text-to-Text • 6B • Updated • 375 • 5 -
z-lab/Qwen3.5-9B-PARO
Image-Text-to-Text • 3B • Updated • 12.1k • 36 -
z-lab/Qwen3.5-4B-PARO
Image-Text-to-Text • 1B • Updated • 1.75k • 13
Block Diffusion for Flash Speculative Decoding
Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
-
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Paper • 2511.10645 • Published • 7 -
z-lab/Qwen3.5-27B-PARO
Image-Text-to-Text • 6B • Updated • 375 • 5 -
z-lab/Qwen3.5-9B-PARO
Image-Text-to-Text • 3B • Updated • 12.1k • 36 -
z-lab/Qwen3.5-4B-PARO
Image-Text-to-Text • 1B • Updated • 1.75k • 13
models 29
z-lab/Qwen3.5-27B-PARO
Image-Text-to-Text • 6B • Updated
• 375 • 5
z-lab/Qwen3.5-35B-A3B-DFlash
Text Generation • 0.5B • Updated
• 776 • 8
z-lab/Qwen3.5-27B-DFlash
Text Generation • 4B • Updated
• 57 • 4
z-lab/LLaMA3.1-8B-Instruct-DFlash-UltraChat
Text Generation • 1B • Updated
• 1.64k • 2
z-lab/Qwen3-Coder-30B-A3B-DFlash
Text Generation • 0.5B • Updated
• 694 • 27
z-lab/Qwen3-8B-DFlash-b16
Text Generation • 1B • Updated
• 9.16k • 20
z-lab/Qwen3-4B-DFlash-b16
Text Generation • 0.5B • Updated
• 29.4k • 22
z-lab/gpt-oss-120b-DFlash
Text Generation • 0.8B • Updated
• 752 • 2
z-lab/gpt-oss-20b-DFlash
Text Generation • 0.8B • Updated
• 1.88k • 12
z-lab/Qwen3-Coder-Next-DFlash
Text Generation • 0.5B • Updated
• 130 • 4
datasets 0
None public yet