Export Aquilachat-7b to ONNX via Optimum Failed

#2
by sammysun0711 - opened

Hi, @qhduan , first of all, thanks for your great work, it is really helpful for community for Chinses LLM.
As I see in modeling_aquila.py, this model can re-use most of structure of LlaMA, and Optimum supports LlaMA ONNX export.
So I save Aquilachat-7b model locally and try to export ONNX model as follow:

optimum-cli export onnx --model aquilachat-7b \
    --task text-generation --trust-remote-code \
    --framework pt --opset 17 onnx

Here I met following issue during shape inference:

~/anaconda3/envs/aigc/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py
309 in _create_node
_C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
RuntimeError: ScalarType ComplexFloat is an unexpected tensor scalar type 

I checked that node: %443 indeed a Tensor with ComplexFloat type in self_attn.
Since Complex types is a known limitation for the ONNX exporter: https://github.com/pytorch/pytorch/issues/59246, could you please share any workaround to export ONNX model?

node:  %442 : Tensor = onnx::Transpose[perm=[0, 2, 1, 3]](%441), scope: transformers_modules.aquilachat-7b.modeling_aquila.LlamaForCausalLM::/transformers_modules.aquilachat-7b.modeling_aquila.LlamaModel::model/transformers_modules.aquilachat-7b.modeling_aquila.LlamaDecoderLayer::layers.0/transformers_modules.aquilachat-7b.modeling_aquila.LlamaAttention::self_attn

value_t:  tensor([[ 1.0000+0.0000e+00j,  1.0000+0.0000e+00j,  1.0000+0.0000e+00j,
          ...,  1.0000+0.0000e+00j,  1.0000+0.0000e+00j,
          1.0000+0.0000e+00j],
        [ 0.5403+8.4147e-01j,  0.6479+7.6172e-01j,  0.7318+6.8156e-01j,
          ...,  1.0000+1.5399e-04j,  1.0000+1.3335e-04j,
          1.0000+1.1548e-04j],
        [-0.4161+9.0930e-01j, -0.1604+9.8705e-01j,  0.0709+9.9748e-01j,
          ...,  1.0000+3.0799e-04j,  1.0000+2.6670e-04j,
          1.0000+2.3096e-04j],
        ...,
        [-0.8799+4.7523e-01j,  0.7803+6.2535e-01j, -0.9998+1.9127e-02j,
          ...,  0.8079+5.8938e-01j,  0.8547+5.1911e-01j,
          0.8904+4.5525e-01j],
        [-0.8753-4.8361e-01j,  0.0292+9.9957e-01j, -0.7446-6.6752e-01j,
          ...,  0.8078+5.8951e-01j,  0.8546+5.1922e-01j,
          0.8903+4.5535e-01j],
        [-0.0660-9.9782e-01j, -0.7424+6.6991e-01j, -0.0900-9.9594e-01j,
          ...,  0.8077+5.8963e-01j,  0.8546+5.1934e-01j,
          0.8903+4.5545e-01j]])
node:  %443 : Tensor = onnx::Constant[value=<Tensor>]()

Aquila seems to use the META's official RoPE implementation (but a little different in float16), HuggingFace transformers' Llama re-implementation it, but it has some different with META's.

That's why I replace RoPE code from transformers' Llama to META's, I really don't have time to check what's the different and fix it, maybe later, it would be great if you could help.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment