Spaces:

furiosa-ai
/

README

Running

App Files Files Community

README / README.md

hyunsikc

Commented out qwen-14b based models

62043bb verified 2 days ago

preview code

raw

history blame contribute delete

8.07 kB

	---
	title: README
	emoji: 🚀
	colorFrom: pink
	colorTo: red
	sdk: static
	pinned: true
	---
	FuriosaAI develops data center AI accelerators. Our RNGD (pronounced “Renegade) accelerator, currently sampling,
	excels at high-performance inference for LLMs and agentic AI.

	Get started fast with common inference tasks on RNGD
	using these pre-compiled popular Hugging Face models – no manual conversion or quantization needed. Requires Furiosa SDK 2025.2 or later on a server with RNGD accelerator.

	Need a model with custom configurations? Compile it yourself using our [Model Preparation Workflow](https://developer.furiosa.ai/latest/en/furiosa_llm/model-preparation.html) on Furiosa Docs.
	Visit [Supported Models](https://developer.furiosa.ai/latest/en/overview/supported_models.html) in the SDK documentation
	for more information and learn more about RNGD at https://furiosa.ai/rngd.


	## Pre-compiled models

	Please check out the collection of models at https://huggingface.co/furiosa-ai/collections.

	\| Pre-compiled Model \| Description \| Base Model \| Support Version \|
	\| ------------------------------------------------------------------------------------------------------------- \| ------------------------------------ \|-------------------------------------------------------------------------------------------------------------- \| ----------------\|
	\| [furiosa-ai/bert-large-uncased-INT8-MLPerf](https://huggingface.co/furiosa-ai/bert-large-uncased-INT8-MLPerf) \| INT8 quantized, optimized for MLPerf \| [google-bert/bert-large-uncased](https://huggingface.co/google-bert/bert-large-uncased) \| 2025.2 \|
	\| [furiosa-ai/gpt-j-6b-FP8-MLPerf](https://huggingface.co/furiosa-ai/gpt-j-6b-FP8-MLPerf) \| FP8 quantized, optimized for MLPerf \| [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) \| 2025.2 \|
	\| [furiosa-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Llama-8B) \| BF16 \| [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) \| >= 2025.3 \|
	\| [furiosa-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Llama-70B) \| BF16 \| [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) \| >= 2025.3 \|
	\| [furiosa-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-7B) \| BF16 \| [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) \| >= 2025.3 \|
	\| [furiosa-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-32B) \| BF16 \| [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) \| >= 2025.3 \|
	\| [furiosa-ai/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/furiosa-ai/EXAONE-3.5-7.8B-Instruct) \| BF16 \| [LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct) \| >= 2025.2 \|
	\| [furiosa-ai/EXAONE-3.5-32B-Instruct](https://huggingface.co/furiosa-ai/EXAONE-3.5-32B-Instruct) \| BF16 \| [LGAI-EXAONE/EXAONE-3.5-32B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct) \| >= 2025.2 \|
	\| [furiosa-ai/Llama-3.1-8B-Instruct](https://huggingface.co/furiosa-ai/Llama-3.1-8B-Instruct) \| BF16 \| [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) \| >= 2025.2 \|
	\| [furiosa-ai/Llama-3.1-8B-Instruct-FP8](https://huggingface.co/furiosa-ai/Llama-3.1-8B-Instruct-FP8) \| FP8 quantized \| [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) \| >= 2025.2 \|
	\| [furiosa-ai/Llama-3.3-70B-Instruct](https://huggingface.co/furiosa-ai/Llama-3.3-70B-Instruct) \| BF16 \| [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) \| >= 2025.3 \|
	\| [furiosa-ai/Llama-3.3-70B-Instruct-INT8](https://huggingface.co/furiosa-ai/Llama-3.3-70B-Instruct-INT8) \| INT8 weight quantization \| [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) \| >= 2025.3 \|
	\| [furiosa-ai/Qwen2.5-7B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-7B-Instruct) \| BF16 \| [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) \| >= 2025.3 \|
	\| [furiosa-ai/Qwen2.5-32B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-32B-Instruct) \| BF16 \| [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) \| >= 2025.3 \|
	\| [furiosa-ai/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-7B-Instruct) \| BF16 \| [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) \| >= 2025.3 \|
	\| [furiosa-ai/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-32B-Instruct) \| BF16 \| [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) \| >= 2025.3 \|

	<!-- \| [furiosa-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-14B) \| BF16 \| [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) \| >= 2025.3 \| -->
	<!-- \| [furiosa-ai/Qwen2.5-14B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-14B-Instruct) \| BF16 \| [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) \| >= 2025.3 \| -->
	<!-- \| [furiosa-ai/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-14B-Instruct) \| BF16 \| [Qwen/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) \| >= 2025.3 \| -->

	## Examples

	First, install the pre-requisites by following [Installing Furiosa-LLM](https://developer.furiosa.ai/latest/en/getting_started/furiosa_llm.html#installing-furiosa-llm).

	Then, run the following command to start the Furiosa-LLM server with the Llama-3.1-8B-Instruct-FP8 model:

	```
	furiosa-llm serve furiosa-ai/Llama-3.1-8B-Instruct-FP8
	```

	For reasoning models like DeepSeek-R1-Distill-Llama-8B, you can enable the reasoning mode with a proper reasoning parser:

	```
	furiosa-llm serve furiosa-ai/DeepSeek-R1-Distill-Llama-8B \
	--enable-reasoning --reasoning-parser deepseek_r1
	```

	Once your server has launched, you can query the model with input prompts:
	```sh
	curl http://localhost:8000/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "EMPTY",
	"messages": [{"role": "user", "content": "What is the capital of France?"}]
	}' \
	\| python -m json.tool
	```

	You can also learn more about usages from [Quick Start with Furiosa-LLM](https://developer.furiosa.ai/latest/en/getting_started/furiosa_llm.html#installing-furiosa-llm).