Spaces:
Running
Running
File size: 8,042 Bytes
dd89a90 9757c60 dd89a90 9757c60 dd89a90 1c5585c 34d79e8 1c5585c 978c962 1c5585c d630ed0 0377284 9757c60 d39a436 2eed2e0 e15854c 2eed2e0 d39a436 6d15135 d39a436 2eed2e0 7a45eac a58a7d0 6d15135 a58a7d0 6d15135 ee1f451 50a4aaa 1462717 34d79e8 1462717 34d79e8 1462717 b317495 34d79e8 1462717 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
title: README
emoji: 🚀
colorFrom: pink
colorTo: red
sdk: static
pinned: true
---
FuriosaAI develops data center AI accelerators. Our RNGD (pronounced “Renegade) accelerator, currently sampling,
excels at high-performance inference for LLMs and agentic AI.
Get started fast with common inference tasks on RNGD
using these pre-compiled popular Hugging Face models – no manual conversion or quantization needed. Requires Furiosa SDK 2025.2 or later on a server with RNGD accelerator.
Need a model with custom configurations? Compile it yourself using our [Model Preparation Workflow](https://developer.furiosa.ai/latest/en/furiosa_llm/model-preparation.html) on Furiosa Docs.
Visit [Supported Models](https://developer.furiosa.ai/latest/en/overview/supported_models.html) in the SDK documentation
for more information and learn more about RNGD at https://furiosa.ai/rngd.
## Pre-compiled models
Please check out the collection of models at https://huggingface.co/furiosa-ai/collections.
| Pre-compiled Model | Description | Base Model | Support Version |
| ------------------------------------------------------------------------------------------------------------- | ------------------------------------ |-------------------------------------------------------------------------------------------------------------- | ----------------|
| [furiosa-ai/bert-large-uncased-INT8-MLPerf](https://huggingface.co/furiosa-ai/bert-large-uncased-INT8-MLPerf) | INT8 quantized, optimized for MLPerf | [google-bert/bert-large-uncased](https://huggingface.co/google-bert/bert-large-uncased) | 2025.2 |
| [furiosa-ai/gpt-j-6b-FP8-MLPerf](https://huggingface.co/furiosa-ai/gpt-j-6b-FP8-MLPerf) | FP8 quantized, optimized for MLPerf | [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | 2025.2 |
| [furiosa-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Llama-8B) | BF16 | [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) | >= 2025.3 |
| [furiosa-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Llama-70B) | BF16 | [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | >= 2025.3 |
| [furiosa-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-7B) | BF16 | [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) | >= 2025.3 |
| [furiosa-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-14B) | BF16 | [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) | >= 2025.3 |
| [furiosa-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-32B) | BF16 | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | >= 2025.3 |
| [furiosa-ai/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/furiosa-ai/EXAONE-3.5-7.8B-Instruct) | BF16 | [LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct) | >= 2025.2 |
| [furiosa-ai/EXAONE-3.5-32B-Instruct](https://huggingface.co/furiosa-ai/EXAONE-3.5-32B-Instruct) | BF16 | [LGAI-EXAONE/EXAONE-3.5-32B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct) | >= 2025.2 |
| [furiosa-ai/Llama-3.1-8B-Instruct](https://huggingface.co/furiosa-ai/Llama-3.1-8B-Instruct) | BF16 | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | >= 2025.2 |
| [furiosa-ai/Llama-3.1-8B-Instruct-FP8](https://huggingface.co/furiosa-ai/Llama-3.1-8B-Instruct-FP8) | FP8 quantized | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | >= 2025.2 |
| [furiosa-ai/Llama-3.3-70B-Instruct](https://huggingface.co/furiosa-ai/Llama-3.3-70B-Instruct) | BF16 | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | >= 2025.3 |
| [furiosa-ai/Llama-3.3-70B-Instruct-INT8](https://huggingface.co/furiosa-ai/Llama-3.3-70B-Instruct-INT8) | INT8 weight quantization | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | >= 2025.3 |
| [furiosa-ai/Qwen2.5-7B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-7B-Instruct) | BF16 | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | >= 2025.3 |
| [furiosa-ai/Qwen2.5-14B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-14B-Instruct) | BF16 | [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) | >= 2025.3 |
| [furiosa-ai/Qwen2.5-32B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-32B-Instruct) | BF16 | [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | >= 2025.3 |
| [furiosa-ai/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-7B-Instruct) | BF16 | [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) | >= 2025.3 |
| [furiosa-ai/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-14B-Instruct) | BF16 | [Qwen/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) | >= 2025.3 |
| [furiosa-ai/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-32B-Instruct) | BF16 | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) | >= 2025.3 |
## Examples
First, install the pre-requisites by following [Installing Furiosa-LLM](https://developer.furiosa.ai/latest/en/getting_started/furiosa_llm.html#installing-furiosa-llm).
Then, run the following command to start the Furiosa-LLM server with the Llama-3.1-8B-Instruct-FP8 model:
```
furiosa-llm serve furiosa-ai/Llama-3.1-8B-Instruct-FP8
```
For reasoning models like DeepSeek-R1-Distill-Llama-8B, you can enable the reasoning mode with a proper reasoning parser:
```
furiosa-llm serve furiosa-ai/DeepSeek-R1-Distill-Llama-8B \
--enable-reasoning --reasoning-parser deepseek_r1
```
Once your server has launched, you can query the model with input prompts:
```sh
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "EMPTY",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}' \
| python -m json.tool
```
You can also learn more about usages from [Quick Start with Furiosa-LLM](https://developer.furiosa.ai/latest/en/getting_started/furiosa_llm.html#installing-furiosa-llm). |