Steps to deploy a production ready service for QwQ on AWS using serverless GPUs
#61
by
samagra14
- opened
We just dropped a guide that can be used to deploy Qwen on your AWS account with autoscaling and scale to zero serverless GPUs here - https://tensorfuse.io/docs/guides/reasoning/qwen_qwq
The above guide is for multi-GPU L4 instances as L4s are the cheapest ones on AWS, feel free to make changes to try it on L40S, A10Gs, A100s etc. Soon will follow up with metrics around single request tokens / sec and throughput.