YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Skywork-R1V-38B-AWQ

Introduction Image

📖 Technical Report | 💻 GitHub | 🌐 Wisemodel

GitHub Stars GitHub Forks

Evaluation

Comparison with Larger-Scale Open-Source and Closed-Source Models
Benchmark LLM VLM
QwQ-32B-Preview InternVL-2.5-38B VILA 1.5-40B InternVL2-40B Skywork-R1V-38B Skywork-R1V-AWQ
Reasoning MATH-500 90.6 - - - 94.0 86.0
AIME 2024 50.0 - - - 72.0 61.0
GPQA 54.5 - - - 61.6 56.5
Vision MathVista(mini) - 71.9 49.5 63.7 67.5 59.9
MMMU(Val) - 63.9 55.1 55.2 69.0 60.1

Usage

You can use the quantized model with different inference frameworks:

Using VLLM

Python API

import os
from vllm import LLM, SamplingParams
from vllm.entrypoints.chat_utils import load_chat_template

model_name = "Skywork/Skywork-R1V-38B-AWQ"  # or local path
llm = LLM(model_name, 
          dtype='float16', 
          quantization="awq", 
          gpu_memory_utilization=0.85,
          max_model_len=4096,
          trust_remote_code=True,
         )

# Add your inference code here

OpenAI-compatible API Server

MODEL_ID="Skywork/Skywork-R1V-38B-AWQ"  # or local path


CUDA_VISIBLE_DEVICES=0 \
    python -m vllm.entrypoints.openai.api_server \
    --model $MODEL_ID \
    --dtype float16 \
    --quantization awq \
    --port 23334 \
    --max-model-len 12000 \
    --gpu-memory-utilization 0.9 \
    --trust-remote-code

Using LMDeploy

import os
from lmdeploy import pipeline, TurbomindEngineConfig, ChatTemplateConfig
from lmdeploy.vl import load_image

model_path = "Skywork/Skywork-R1V-38B-AWQ"  # or local path

engine_config = TurbomindEngineConfig(cache_max_entry_count=0.75) 
chat_template_config = ChatTemplateConfig(model_name=model_path)
pipe = pipeline(model_path, 
                backend_config=engine_config, 
                chat_template_config=chat_template_config,
               )

# Example: Multimodal inference
image = load_image('table.jpg')
response = pipe(('Describe this image?', image))
print(response.text)

Hardware Requirements

The AWQ quantization reduces the memory footprint compared to the original FP16 model. We recommend:

  • At least one GPU with 30GB+ VRAM for inference
  • For optimal performance with longer contexts, 40GB+ VRAM is recommended

Citation

If you use this model in your research, please cite:

@article{skywork2025r1v,
  title     = {Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
  author    = {Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tianyidan Xie, Li Ge, Rongxian Zhuang, Xuchen Song, Yang Liu, Yahui Zhou},
  year      = {2025},
  journal   = {https://github.com/SkyworkAI/Skywork-R1V/blob/main/report/Skywork_R1V.pdf},
  url       = {https://huggingface.co/Skywork/Skywork-R1V-38B}
}

Skywork-R1V-38B-AWQ (中文说明)

使用方法

您可以使用不同的推理框架来使用这个量化模型:

使用 VLLM

Python API

import os
from vllm import LLM, SamplingParams
from vllm.entrypoints.chat_utils import load_chat_template

model_name = "Skywork/Skywork-R1V-38B-AWQ"  # 或本地路径
llm = LLM(model_name, 
          dtype='float16', 
          quantization="awq", 
          gpu_memory_utilization=0.85,
          max_model_len=4096,
          trust_remote_code=True,
         )

# 在此添加您的推理代码

OpenAI 兼容的 API 服务器

MODEL_ID="Skywork/Skywork-R1V-38B-AWQ"  # 或本地路径

CUDA_VISIBLE_DEVICES=0 \
    python -m vllm.entrypoints.openai.api_server \
    --model $MODEL_ID \
    --dtype float16 \
    --quantization awq \
    --port 23334 \
    --max-model-len 12000 \
    --gpu-memory-utilization 0.9 \
    --trust-remote-code

使用 LMDeploy

import os
from lmdeploy import pipeline, TurbomindEngineConfig, ChatTemplateConfig
from lmdeploy.vl import load_image

model_path = "Skywork/Skywork-R1V-38B-AWQ"  # 或本地路径

engine_config = TurbomindEngineConfig(cache_max_entry_count=0.75) 
chat_template_config = ChatTemplateConfig(model_name=model_path)
pipe = pipeline(model_path, 
                backend_config=engine_config, 
                chat_template_config=chat_template_config,
               )

# 示例:多模态推理
image = load_image('table.jpg')
response = pipe(('描述这个图片?', image))
print(response.text)

硬件要求

与原始 FP16 模型相比,AWQ 量化减少了内存占用。我们建议:

  • 至少一块 30GB+ 显存的 GPU 用于推理
  • 对于更长上下文的最佳性能,建议使用 40GB+ 显存

引用

如果您在研究中使用此模型,请引用:

@article{skywork2025r1v,
  title     = {Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
  author    = {Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tianyidan Xie, Li Ge, Rongxian Zhuang, Xuchen Song, Yang Liu, Yahui Zhou},
  year      = {2025},
  journal   = {https://github.com/SkyworkAI/Skywork-R1V/blob/main/report/Skywork_R1V.pdf},
  url       = {https://huggingface.co/Skywork/Skywork-R1V-38B}
}
Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Skywork/Skywork-R1V-38B-AWQ