Interesting...
#1
by
deleted
- opened
I think that is just the ZeroGPU interface is slowing down over the past month, in the code I set it to target 1000 tokens for response, last month it can do it within 55s, so I set the GPU time to 60. After testing, I find now somehow I have to set it to at least 100s to be safe... I guess huggingface is limiting ZeroGPU speed, or is just now we have too many people using ZeroGPU spaces?
Bottom-line, this is a test model anyways, I am now working on expending dataset to cover more recent physics proceedings, and I think (hopefully) at the end of the month I can get the new version out with proper llama.cpp support so that you can run locally without this ZeroGPU limitations.