Hey
@alvarobartt
Thank you for the quick reply, I'll definitely check out the links.
Can you help me with one more think, So i have a use case where i have to classify like large data points like 5000-20000 into labels which are like 20-80 in like max 30-40 seconds, currently i use the zero shot classifier for them, using hugging face, but the inferencing are very slow. so i wanted to convert them to this endpoint so that i can give multiple request to this endpoint and get the job done.
i also tried using some 1b parameter llms using vllm on gpus for inferencing but they were not accurate.
can you help me out with a solution, what can be done in this case.