File size: 2,270 Bytes

# DeepSeek-R1-Distill-Llama-8B-q4f16_ft-MLC

|                     |                                              Model Configuration                                              |
|---------------------|:-------------------------------------------------------------------------------------------------------------:|
| Source Model        | [`deepseek-ai/DeepSeek-R1-Distill-Llama-8B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |
| Inference API       |                                                   `MLC_LLM`                                                   |
| Quantization        |                                                  `q4f16_ft`                                                   |
| Model Type          |                                                    `llama`                                                    |
| Vocab Size          |                                                   `128256`                                                    |
| Context Window Size |                                                   `131072`                                                    |
| Prefill Chunk Size  |                                                    `8192`                                                     |
| Temperature         |                                                     `0.6`                                                     |
| Repetition Penalty  |                                                     `1.0`                                                     |
| top_p               |                                                    `0.95`                                                     |
| pad_token_id        |                                                      `0`                                                      |
| bos_token_id        |                                                   `128000`                                                    |
| eos_token_id        |                                                   `128001`                                                    |

See [`jetson-ai-lab.com/models.html`](https://jetson-ai-lab.com/models.html) for benchmarks, examples, and containers to deploy local serving and inference for these quantized models.