Draft models
Collection
2 items
•
Updated
Updated to v1
This model is trained on CODE outputs of deepseek-ai/DeepSeek-R1-Distill-Qwen-32B and is meant to be used only as draft model for speculative decoding.
It's specifically intended for users of 3090/4090, allowing you to run the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version with 16k context and speeding up generation without sacrificing more context length or model quality.
The data consists of code tasks collected from various datasets. It has been trained for 2 epochs on 2.5k unique examples, for a total of 7.6 million tokens per epoch.
Since data generation was done using spare GPU time, I may publish a further trained version later.