hidden size of 5120 divided by the number of attention heads (32) is 160, but head size mentioned in the config.json is 128.this causes the model load to fail - any fix, or I am alone?
You can try to use vllm==0.5.3 and torch==2.3.1
· Sign up or log in to comment