Skywork-R1V
Collection
pioneering multimodal reasoning with cot
•
4 items
•
Updated
•
8
Benchmark | LLM | VLM | |||||
---|---|---|---|---|---|---|---|
QwQ-32B-Preview | InternVL-2.5-38B | VILA 1.5-40B | InternVL2-40B | Skywork-R1V-38B | Skywork-R1V-AWQ | ||
Reasoning | MATH-500 | 90.6 | - | - | - | 94.0 | 86.0 |
AIME 2024 | 50.0 | - | - | - | 72.0 | 61.0 | |
GPQA | 54.5 | - | - | - | 61.6 | 56.5 | |
Vision | MathVista(mini) | - | 71.9 | 49.5 | 63.7 | 67.5 | 59.9 |
MMMU(Val) | - | 63.9 | 55.1 | 55.2 | 69.0 | 60.1 |
You can use the quantized model with different inference frameworks:
import os
from vllm import LLM, SamplingParams
from vllm.entrypoints.chat_utils import load_chat_template
model_name = "Skywork/Skywork-R1V-38B-AWQ" # or local path
llm = LLM(model_name,
dtype='float16',
quantization="awq",
gpu_memory_utilization=0.85,
max_model_len=4096,
trust_remote_code=True,
)
# Add your inference code here
MODEL_ID="Skywork/Skywork-R1V-38B-AWQ" # or local path
CUDA_VISIBLE_DEVICES=0 \
python -m vllm.entrypoints.openai.api_server \
--model $MODEL_ID \
--dtype float16 \
--quantization awq \
--port 23334 \
--max-model-len 12000 \
--gpu-memory-utilization 0.9 \
--trust-remote-code
import os
from lmdeploy import pipeline, TurbomindEngineConfig, ChatTemplateConfig
from lmdeploy.vl import load_image
model_path = "Skywork/Skywork-R1V-38B-AWQ" # or local path
engine_config = TurbomindEngineConfig(cache_max_entry_count=0.75)
chat_template_config = ChatTemplateConfig(model_name=model_path)
pipe = pipeline(model_path,
backend_config=engine_config,
chat_template_config=chat_template_config,
)
# Example: Multimodal inference
image = load_image('table.jpg')
response = pipe(('Describe this image?', image))
print(response.text)
The AWQ quantization reduces the memory footprint compared to the original FP16 model. We recommend:
If you use this model in your research, please cite:
@article{skywork2025r1v,
title = {Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
author = {Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tianyidan Xie, Li Ge, Rongxian Zhuang, Xuchen Song, Yang Liu, Yahui Zhou},
year = {2025},
journal = {https://github.com/SkyworkAI/Skywork-R1V/blob/main/report/Skywork_R1V.pdf},
url = {https://huggingface.co/Skywork/Skywork-R1V-38B}
}
您可以使用不同的推理框架来使用这个量化模型:
import os
from vllm import LLM, SamplingParams
from vllm.entrypoints.chat_utils import load_chat_template
model_name = "Skywork/Skywork-R1V-38B-AWQ" # 或本地路径
llm = LLM(model_name,
dtype='float16',
quantization="awq",
gpu_memory_utilization=0.85,
max_model_len=4096,
trust_remote_code=True,
)
# 在此添加您的推理代码
MODEL_ID="Skywork/Skywork-R1V-38B-AWQ" # 或本地路径
CUDA_VISIBLE_DEVICES=0 \
python -m vllm.entrypoints.openai.api_server \
--model $MODEL_ID \
--dtype float16 \
--quantization awq \
--port 23334 \
--max-model-len 12000 \
--gpu-memory-utilization 0.9 \
--trust-remote-code
import os
from lmdeploy import pipeline, TurbomindEngineConfig, ChatTemplateConfig
from lmdeploy.vl import load_image
model_path = "Skywork/Skywork-R1V-38B-AWQ" # 或本地路径
engine_config = TurbomindEngineConfig(cache_max_entry_count=0.75)
chat_template_config = ChatTemplateConfig(model_name=model_path)
pipe = pipeline(model_path,
backend_config=engine_config,
chat_template_config=chat_template_config,
)
# 示例:多模态推理
image = load_image('table.jpg')
response = pipe(('描述这个图片?', image))
print(response.text)
与原始 FP16 模型相比,AWQ 量化减少了内存占用。我们建议:
如果您在研究中使用此模型,请引用:
@article{skywork2025r1v,
title = {Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
author = {Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tianyidan Xie, Li Ge, Rongxian Zhuang, Xuchen Song, Yang Liu, Yahui Zhou},
year = {2025},
journal = {https://github.com/SkyworkAI/Skywork-R1V/blob/main/report/Skywork_R1V.pdf},
url = {https://huggingface.co/Skywork/Skywork-R1V-38B}
}