onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX

ONNX Community org 2 days ago

When running the below , it gives the following error

const generator = await pipeline('text-generation','onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX',{dtype:"q4f16"});

An error occurred during model execution: "Error: Non-zero status code returned while running Cast node. Name:'InsertedPrecisionFreeCast_/model/layers.1/attn/v_proj/repeat_kv/Reshape_4/output_0' Status Message: D:\a\_work\1\s\onnxruntime\core\framework\op_kernel.cc:83 onnxruntime::OpKernelContext::OutputMLValue status.IsOK() was false. Shape mismatch attempting to re-use buffer. {1,1,1536} != {1,12,1536}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.
".

Xenova

ONNX Community org 2 days ago

Indeed, that's a bug with the node.js implementation of onnxruntime (as it works correct in the browser).

You can use an earlier review (pre https://huggingface.co/onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX/commit/f9c94fd59ec97bdb5e7587d09343797481a8c385) to use the GQA variant of the model.
cc @schmuell

schmuell

ONNX Community org about 13 hours ago

odd, it works for me with onnxruntime-genai and the native webgpu ep.
What ep are you using? If it is cpu it is not going to like the fp16 and will cast to fp32 by inserting cast op into the graph. Wonder if something is going wrong there.

onnx-community
/

DeepSeek-R1-Distill-Qwen-1.5B-ONNX

q4f16 ONXX model issue