hkitsmallpotato/litellm · Project Deployment Issue

Hello,

First of all, Merry Christmas! :D

Thanks for your inquiry. This is already a few months ago so I have already forgotten some details, but below is my attempt at a reconstruction:

This setup is kind of specific to huggingface space platform, so you may need to adapt it if using other PaaS style or CaaS cloud.
I don't exactly recommend reproducing the exact deployment on huggingface space - the last time I tried I remember almost getting banned and receiving some security warning email etc, probably have to do with both platform internals and ToS restrictions. Generally speaking single deployment connecting to your own external API is okay, but multiple interconnected deployments using direct call in HF internal network (allowed exception I think is gradio space calling other gradio space using gradio mechanism) is not. I think doing some kind of API endpoint is gray area, not sure about that. This space is left on just for reference (and coz I'm too lazy). (My guess is that the spirit is "AI related research/prototype = okay, general webapp style service = not okay")
You can actually check the source code of this space: https://huggingface.co/spaces/hkitsmallpotato/litellm/tree/main

Some further elaboration of the source code:

The Dockerfile reuse the official litellm image, but modify it by injecting our config file (following the litellm official guide mostly while also making it compatible with HF's own image build pipeline) (https://docs.litellm.ai/docs/proxy/deploy) (https://huggingface.co/docs/hub/spaces-sdks-docker-first-demo) I think some of it is HF specific (eg the port number)
The litellm config is based on some trial and error and also trying my best to read their doc (it was half a year ago, the doc back then can be quite confusing). Generally speaking, it is divided into a global config and configuring your own model connections.
For the models, I find back then that just manually specifying the litellm-side model name (eg - model_name: cf-sqlcoder-7b-2) one by one works robustly.
For the connection params (those under litellm_params), while it might be possible to let litellm default infer the correct env variable to use for credentials (maybe they have that, using the connector type to decide), I personally find that just explicitly invoke the env override myself gives better control and assurance that it works. (Eg api_key: os.environ/CLOUDFLARE_API_KEY is a litellm specific syntax to inject secrets from env as is industry standard practise)
Many LLM shared inference cloud provider gives OpenAI compatible API, so just use the openai/[provider side model ID] instead of litellm's connector like openrouter/[model id] is actually okay (but then you're kind of falling back to the common lowest denominator of what OpenAI API format supports). Remember to specify api_base if you do that though, otherwise it will connect to the actual OpenAI URL.
Extra tips: A caveat is that "Many LLM shared inference cloud provider gives OpenAI compatible API", but different cloud provider may differ on the level of support once you go past "just chat completion" and want more in-depth features, such as function calling, structured output...

You also need a bunch of variables and env secrets:

Name	Purpose
UI_USERNAME	For login to litellm UI
LANGFUSE_HOST	Connect to external langfuse
CLOUDFLARE_ACCOUNT_ID	Cloudflare need both account id and API key
DATABASE_URL	litellm database to enable more features
LANGFUSE_PUBLIC_KEY	For interacting with langfuse
LANGFUSE_SECRET_KEY	For interacting with langfuse
LITELLM_MASTER_KEY	Internal Master API key to your litellm instance
UI_PASSWORD	For login to litellm UI

I used some third party serverless DB provider, and use a third party PaaS to host langfuse.

As for setup of langfuse, I have even less memory of that part lol.

Finally, one more remark:

litellm is nice, but it is a typical "sponsored by company" type project and as such has a mixture of OSS component and locked/paid features. Unfortunately, half a year ago when I deploy this, this mean the UI will error out in a kinda weird way when you try to access some pages. Still usable, but UX probably won't be good enough for consumer (instead of internal devs). I remember someone recently forked off litellm in an attempt to truly cleanup the codebase and go full OSS etc, you may search it online.

If you read this far, thanks!