Kernel assertion errors on 5090 using generation with MXfp4 (gpt-oss) - (stable on 4090)

#1
by cmp-nct - opened

File "/root/.cache/huggingface/hub/models--kernels-community--triton_kernels/snapshots/1d2e9557ac0d4c651055a209055748d4db0fe65b/build/torch-universal/triton_kernels/matmul_ogs_details/opt_flags.py", line 214, in make_default_opt_flags_nvidia
assert num_stages >= 1

I had to manually comment that assertion to get it running.
Otherwise I've in 30% of batch sizes and prompt lengths AssertionError crashes with gpt-oss-20b on my 5090

On 4090 I've no such problems.

kernels-community org

Thanks for this ! Would you like to open a PR for that ? Otherwise, I will sync the latest version in a few days after it is a bit more stable on triton kernels side

Hi, I think commenting it out is not the right final solution. I'm not experienced in triton lang to fix the bug.

The problem is probably in the calculation that leads to num_stages getting assigned 0, someone with experience in triton needs to fix it.
The dev responsible for the function should take a look, maybe it's something obvious.

kernels-community org
edited 2 days ago

Hmmm seems like I can't update triton_kernels as newer commits depends on triton and asking users to install triton main is not really a nice option. Let's comment this for now (modification done !)
Once I get my hands on a 5090, I'll able to make the support better.

Here is a PR on transformers side:
https://github.com/huggingface/transformers/pull/40358

It appears to solve this but I'm not 100% sure as in my case everything was working for a while and suddenly I ran into the assertion.
The problem there is a tensor shape compatibility issue that's being unified.
@marcsun13 maybe you want to pull that PR and run a custom transformers for a test ? Would be great to get one more effected person testing it.
"pip install -e ." took only a minute or so for me

Sign up or log in to comment