File size: 8,042 Bytes
dd89a90
 
9757c60
dd89a90
 
 
9757c60
dd89a90
1c5585c
 
 
 
34d79e8
1c5585c
978c962
1c5585c
d630ed0
0377284
 
9757c60
 
d39a436
 
2eed2e0
 
 
 
e15854c
2eed2e0
d39a436
6d15135
d39a436
2eed2e0
 
 
 
 
7a45eac
a58a7d0
6d15135
a58a7d0
 
6d15135
ee1f451
 
50a4aaa
1462717
 
34d79e8
1462717
34d79e8
1462717
 
 
 
 
b317495
 
 
 
 
 
 
34d79e8
1462717
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
title: README
emoji: 🚀
colorFrom: pink
colorTo: red
sdk: static
pinned: true
---
FuriosaAI develops data center AI accelerators. Our RNGD (pronounced “Renegade) accelerator, currently sampling, 
excels at high-performance inference for LLMs and agentic AI. 

Get started fast with common inference tasks on RNGD 
using these pre-compiled popular Hugging Face models – no manual conversion or quantization needed. Requires Furiosa SDK 2025.2 or later on a server with RNGD accelerator.

Need a model with custom configurations? Compile it yourself using our [Model Preparation Workflow](https://developer.furiosa.ai/latest/en/furiosa_llm/model-preparation.html) on Furiosa Docs. 
Visit [Supported Models](https://developer.furiosa.ai/latest/en/overview/supported_models.html) in the SDK documentation 
for more information and learn more about RNGD at https://furiosa.ai/rngd.


## Pre-compiled models

Please check out the collection of models at https://huggingface.co/furiosa-ai/collections.

| Pre-compiled Model                                                                                            | Description                          | Base Model                                                                                                    | Support Version |
| ------------------------------------------------------------------------------------------------------------- | ------------------------------------ |-------------------------------------------------------------------------------------------------------------- | ----------------|
| [furiosa-ai/bert-large-uncased-INT8-MLPerf](https://huggingface.co/furiosa-ai/bert-large-uncased-INT8-MLPerf) | INT8 quantized, optimized for MLPerf | [google-bert/bert-large-uncased](https://huggingface.co/google-bert/bert-large-uncased)                       | 2025.2          |
| [furiosa-ai/gpt-j-6b-FP8-MLPerf](https://huggingface.co/furiosa-ai/gpt-j-6b-FP8-MLPerf)                       | FP8 quantized, optimized for MLPerf  | [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b)                                             | 2025.2          |
| [furiosa-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Llama-8B)     | BF16                                 | [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)   | >= 2025.3       |
| [furiosa-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Llama-70B)   | BF16                                 | [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | >= 2025.3       |
| [furiosa-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-7B)       | BF16                                 | [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)     | >= 2025.3       |
| [furiosa-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-14B)     | BF16                                 | [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)   | >= 2025.3       |
| [furiosa-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-32B)     | BF16                                 | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)   | >= 2025.3       |
| [furiosa-ai/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/furiosa-ai/EXAONE-3.5-7.8B-Instruct)             | BF16                                 | [LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct)           | >= 2025.2       |
| [furiosa-ai/EXAONE-3.5-32B-Instruct](https://huggingface.co/furiosa-ai/EXAONE-3.5-32B-Instruct)               | BF16                                 | [LGAI-EXAONE/EXAONE-3.5-32B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct)             | >= 2025.2       |
| [furiosa-ai/Llama-3.1-8B-Instruct](https://huggingface.co/furiosa-ai/Llama-3.1-8B-Instruct)                   | BF16                                 | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)                   | >= 2025.2       |
| [furiosa-ai/Llama-3.1-8B-Instruct-FP8](https://huggingface.co/furiosa-ai/Llama-3.1-8B-Instruct-FP8)           | FP8 quantized                        | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)                   | >= 2025.2       |
| [furiosa-ai/Llama-3.3-70B-Instruct](https://huggingface.co/furiosa-ai/Llama-3.3-70B-Instruct)                 | BF16                                 | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)                 | >= 2025.3       |
| [furiosa-ai/Llama-3.3-70B-Instruct-INT8](https://huggingface.co/furiosa-ai/Llama-3.3-70B-Instruct-INT8)       | INT8 weight quantization             | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)                 | >= 2025.3       |
| [furiosa-ai/Qwen2.5-7B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-7B-Instruct)                       | BF16                                 | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)                                   | >= 2025.3       |
| [furiosa-ai/Qwen2.5-14B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-14B-Instruct)                     | BF16                                 | [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)                                 | >= 2025.3       |
| [furiosa-ai/Qwen2.5-32B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-32B-Instruct)                     | BF16                                 | [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)                                 | >= 2025.3       |
| [furiosa-ai/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-7B-Instruct)           | BF16                                 | [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)                       | >= 2025.3       |
| [furiosa-ai/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-14B-Instruct)         | BF16                                 | [Qwen/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct)                     | >= 2025.3       |
| [furiosa-ai/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-32B-Instruct)         | BF16                                 | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)                     | >= 2025.3       |


## Examples

First, install the pre-requisites by following [Installing Furiosa-LLM](https://developer.furiosa.ai/latest/en/getting_started/furiosa_llm.html#installing-furiosa-llm).

Then, run the following command to start the Furiosa-LLM server with the Llama-3.1-8B-Instruct-FP8 model:

```
furiosa-llm serve furiosa-ai/Llama-3.1-8B-Instruct-FP8
```

For reasoning models like DeepSeek-R1-Distill-Llama-8B, you can enable the reasoning mode with a proper reasoning parser:

```
furiosa-llm serve furiosa-ai/DeepSeek-R1-Distill-Llama-8B \
  --enable-reasoning --reasoning-parser deepseek_r1
```

Once your server has launched, you can query the model with input prompts:
```sh
curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "EMPTY",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
    }' \
    | python -m json.tool
```

You can also learn more about usages from [Quick Start with Furiosa-LLM](https://developer.furiosa.ai/latest/en/getting_started/furiosa_llm.html#installing-furiosa-llm).