q4f16 ONXX model issue

#5
by mrniamster - opened
ONNX Community org

When running the below , it gives the following error

const generator = await pipeline('text-generation','onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX',{dtype:"q4f16"});

An error occurred during model execution: "Error: Non-zero status code returned while running Cast node. Name:'InsertedPrecisionFreeCast_/model/layers.1/attn/v_proj/repeat_kv/Reshape_4/output_0' Status Message: D:\a\_work\1\s\onnxruntime\core\framework\op_kernel.cc:83 onnxruntime::OpKernelContext::OutputMLValue status.IsOK() was false. Shape mismatch attempting to re-use buffer. {1,1,1536} != {1,12,1536}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.
".
ONNX Community org

Indeed, that's a bug with the node.js implementation of onnxruntime (as it works correct in the browser).

You can use an earlier review (pre https://huggingface.co/onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX/commit/f9c94fd59ec97bdb5e7587d09343797481a8c385) to use the GQA variant of the model.
cc @schmuell

ONNX Community org

odd, it works for me with onnxruntime-genai and the native webgpu ep.
What ep are you using? If it is cpu it is not going to like the fp16 and will cast to fp32 by inserting cast op into the graph. Wonder if something is going wrong there.

Sign up or log in to comment