Steps to deploy a production ready service for QwQ on AWS using serverless GPUs

#61

by samagra14 - opened 20 days ago

20 days ago

We just dropped a guide that can be used to deploy Qwen on your AWS account with autoscaling and scale to zero serverless GPUs here - https://tensorfuse.io/docs/guides/reasoning/qwen_qwq

The above guide is for multi-GPU L4 instances as L4s are the cheapest ones on AWS, feel free to make changes to try it on L40S, A10Gs, A100s etc. Soon will follow up with metrics around single request tokens / sec and throughput.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment