NX-AI/xLSTM-7b · xLSTM inference sample, because no Ollama, LM Studio,.. support!?

Rewrite 20250222: I am not aware of an xLSTM sample inference server. (Just everybody else has one at HuggingFace.)
hello-torch-gpu2-ui1.py in my examples at AI-bits/xlstm streams answers until CUDA crashes. (out of memory)
I think not many have the beefy machine needed - or want to set up their own server to try.
So I guess NX-AI's (almost) only marketing is Sepp.
How about running a sample inference server at JKU?

Cheers
G.
PS: 20250221 late: I tried to whip together a Gradio UI, but there are endless problems with from_pretrained, etc, etc.