Spaces:
Running
Running
Project Deployment Issue
#1
by
Hivra
- opened
Although your project is excellent, I would like to construct one that is comparable to yours. Would you kindly help me by providing the steps used to deploy the project?
Hello,
First of all, Merry Christmas! :D
Thanks for your inquiry. This is already a few months ago so I have already forgotten some details, but below is my attempt at a reconstruction:
- This setup is kind of specific to huggingface space platform, so you may need to adapt it if using other PaaS style or CaaS cloud.
- I don't exactly recommend reproducing the exact deployment on huggingface space - the last time I tried I remember almost getting banned and receiving some security warning email etc, probably have to do with both platform internals and ToS restrictions. Generally speaking single deployment connecting to your own external API is okay, but multiple interconnected deployments using direct call in HF internal network (allowed exception I think is gradio space calling other gradio space using gradio mechanism) is not. I think doing some kind of API endpoint is gray area, not sure about that. This space is left on just for reference (and coz I'm too lazy). (My guess is that the spirit is "AI related research/prototype = okay, general webapp style service = not okay")
- You can actually check the source code of this space: https://huggingface.co/spaces/hkitsmallpotato/litellm/tree/main
Some further elaboration of the source code:
- The
Dockerfile
reuse the officiallitellm
image, but modify it by injecting our config file (following thelitellm
official guide mostly while also making it compatible with HF's own image build pipeline) (https://docs.litellm.ai/docs/proxy/deploy) (https://huggingface.co/docs/hub/spaces-sdks-docker-first-demo) I think some of it is HF specific (eg the port number) - The
litellm
config is based on some trial and error and also trying my best to read their doc (it was half a year ago, the doc back then can be quite confusing). Generally speaking, it is divided into a global config and configuring your own model connections. - For the models, I find back then that just manually specifying the litellm-side model name (eg
- model_name: cf-sqlcoder-7b-2
) one by one works robustly. - For the connection params (those under
litellm_params
), while it might be possible to let litellm default infer the correct env variable to use for credentials (maybe they have that, using the connector type to decide), I personally find that just explicitly invoke the env override myself gives better control and assurance that it works. (Egapi_key: os.environ/CLOUDFLARE_API_KEY
is alitellm
specific syntax to inject secrets from env as is industry standard practise) - Many LLM shared inference cloud provider gives OpenAI compatible API, so just use the
openai/[provider side model ID]
instead of litellm's connector likeopenrouter/[model id]
is actually okay (but then you're kind of falling back to the common lowest denominator of what OpenAI API format supports). Remember to specifyapi_base
if you do that though, otherwise it will connect to the actual OpenAI URL. - Extra tips: A caveat is that "Many LLM shared inference cloud provider gives OpenAI compatible API", but different cloud provider may differ on the level of support once you go past "just chat completion" and want more in-depth features, such as function calling, structured output...
You also need a bunch of variables and env secrets:
Name | Purpose |
---|---|
UI_USERNAME | For login to litellm UI |
LANGFUSE_HOST | Connect to external langfuse |
CLOUDFLARE_ACCOUNT_ID | Cloudflare need both account id and API key |
DATABASE_URL | litellm database to enable more features |
LANGFUSE_PUBLIC_KEY | For interacting with langfuse |
LANGFUSE_SECRET_KEY | For interacting with langfuse |
LITELLM_MASTER_KEY | Internal Master API key to your litellm instance |
UI_PASSWORD | For login to litellm UI |
I used some third party serverless DB provider, and use a third party PaaS to host langfuse.
As for setup of langfuse, I have even less memory of that part lol.
Finally, one more remark:
litellm
is nice, but it is a typical "sponsored by company" type project and as such has a mixture of OSS component and locked/paid features. Unfortunately, half a year ago when I deploy this, this mean the UI will error out in a kinda weird way when you try to access some pages. Still usable, but UX probably won't be good enough for consumer (instead of internal devs). I remember someone recently forked offlitellm
in an attempt to truly cleanup the codebase and go full OSS etc, you may search it online.
If you read this far, thanks!