yym68686 commited on
Commit
ee08a6f
·
1 Parent(s): 212407a

✨ Feature: Added lottery scheduling algorithm and support for random scheduling algorithm.

Browse files
Files changed (4) hide show
  1. README.md +102 -74
  2. README_CN.md +47 -19
  3. main.py +42 -31
  4. utils.py +2 -1
README.md CHANGED
@@ -13,62 +13,83 @@
13
 
14
  ## Introduction
15
 
16
- If used personally, one/new-api is too complex and has many commercial features that individuals do not need. If you do not want a complex front-end interface and want to support more models, you can try uni-api. This is a project that unifies the management of large model APIs, allowing multiple backend services to be called through a unified API interface and uniformly converted to the OpenAI format, supporting load balancing. Currently supported backend services include: OpenAI, Anthropic, Gemini, Vertex, Cohere, Groq, Cloudflare, DeepBricks, OpenRouter, etc.
17
-
18
- ## Features
19
-
20
- - No frontend, pure configuration file setup for API channels. You can run your own API site by just writing one file, with detailed configuration guides in the documentation, beginner-friendly.
21
- - Unified management of multiple backend services, supporting providers like OpenAI, Deepseek, DeepBricks, OpenRouter, and other APIs in the OpenAI format. Supports OpenAI Dalle-3 image generation.
22
- - Supports Anthropic, Gemini, Vertex AI, Cohere, Groq, Cloudflare. Vertex supports both Claude and Gemini APIs.
23
- - Supports OpenAI, Anthropic, Gemini, Vertex native tool use function calls.
24
- - Supports OpenAI, Anthropic, Gemini, Vertex native image recognition API.
25
- - Supports four types of load balancing.
26
- 1. Supports channel-level weighted load balancing, which can allocate requests based on different channel weights. Disabled by default, requires channel weight configuration.
27
- 2. Supports Vertex regional load balancing, supports Vertex high concurrency, and can increase Gemini, Claude concurrency by up to (number of APIs * number of regions) times. Automatically enabled without additional configuration.
28
- 3. In addition to Vertex region-level load balancing, all APIs support channel-level sequential load balancing, enhancing the immersive translation experience. Automatically enabled without additional configuration.
29
  4. Support automatic API key-level round-robin load balancing for multiple API Keys in a single channel.
30
- - Supports automatic retry, when an API channel response fails, automatically retry the next API channel.
31
- - Supports fine-grained access control. Supports using wildcards to set specific models for API key available channels.
32
- - Supports rate limiting, can set the maximum number of requests per minute, can be set as an integer, such as 2/min, 2 times per minute, 5/hour, 5 times per hour, 10/day, 10 times per day, 10/month, 10 times per month, 10/year, 10 times per year. Default is 60/min.
33
  - Supports multiple standard OpenAI format interfaces: `/v1/chat/completions`, `/v1/images/generations`, `/v1/audio/transcriptions`, `/v1/moderations`, `/v1/models`.
34
- - Supports OpenAI moderation for ethical review, allowing for ethical review of user messages. If inappropriate messages are detected, an error message will be returned. This reduces the risk of the backend API being banned by providers.
35
 
36
- ## Configuration
37
 
38
- Using the api.yaml configuration file, multiple models can be configured, and each model can be configured with multiple backend services, supporting load balancing. Below is an example of the api.yaml configuration file:
 
 
 
 
 
 
 
39
 
40
  ```yaml
41
  providers:
42
- - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, deepbricks, any name is fine, required
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
44
  api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required
45
- model: # At least one model must be filled in
46
  - gpt-4o # Usable model name, required
47
- - claude-3-5-sonnet-20240620: claude-3-5-sonnet # Rename model, claude-3-5-sonnet-20240620 is the provider's model name, claude-3-5-sonnet is the renamed name, you can use a concise name instead of the original complex name, optional
48
  - dall-e-3
49
 
50
  - provider: anthropic
51
  base_url: https://api.anthropic.com/v1/messages
52
- api: # Supports multiple API Keys, multiple keys automatically enable polling load balancing, at least one key, required
53
  - sk-ant-api03-bNnAOJyA-xQw_twAA
54
  - sk-ant-api02-bNnxxxx
55
  model:
56
- - claude-3-5-sonnet-20240620: claude-3-5-sonnet # Rename model, claude-3-5-sonnet-20240620 is the provider's model name, claude-3-5-sonnet is the renamed name, you can use a concise name instead of the original complex name, optional
57
- tools: true # Whether to support tools, such as generating code, generating documents, etc., default is true, optional
58
 
59
  - provider: gemini
60
  base_url: https://generativelanguage.googleapis.com/v1beta # base_url supports v1beta/v1, only for Gemini models, required
61
  api: AIzaSyAN2k6IRdgw
62
  model:
63
  - gemini-1.5-pro
64
- - gemini-1.5-flash-exp-0827: gemini-1.5-flash # After renaming, the original model name gemini-1.5-flash-exp-0827 cannot be used. If you want to use the original name, you can add the original name in the model, just add the following line to use the original name.
65
  - gemini-1.5-flash-exp-0827 # Add this line, both gemini-1.5-flash-exp-0827 and gemini-1.5-flash can be requested
66
  tools: true
67
 
68
  - provider: vertex
69
  project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
70
- private_key: "-----BEGIN PRIVATE KEY-----\nxxxxx\n-----END PRIVATE" # Description: Private key of the Google Cloud Vertex AI service account. Format: A JSON formatted string containing the private key information of the service account. How to obtain: Create a service account in the Google Cloud Console, generate a JSON formatted key file, and then set its content as the value of this environment variable.
71
- client_email: [email protected] # Description: Email address of the Google Cloud Vertex AI service account. Format: Usually a string like "[email protected]". How to obtain: Generated when creating the service account, you can also view the service account details in the "IAM & Admin" section of the Google Cloud Console.
72
  model:
73
  - gemini-1.5-pro
74
  - gemini-1.5-flash
@@ -77,14 +98,14 @@ providers:
77
  - claude-3-sonnet@20240229: claude-3-sonnet
78
  - claude-3-haiku@20240307: claude-3-haiku
79
  tools: true
80
- notes: https://xxxxx.com/ # You can put the provider's website, notes, official documentation, optional
81
 
82
  - provider: cloudflare
83
  api: f42b3xxxxxxxxxxq4aoGAh # Cloudflare API Key, required
84
  cf_account_id: 8ec0xxxxxxxxxxxxe721 # Cloudflare Account ID, required
85
  model:
86
- - '@cf/meta/llama-3.1-8b-instruct': llama-3.1-8b # Rename model, @cf/meta/llama-3.1-8b-instruct is the provider's original model name, the model name must be enclosed in quotes, otherwise yaml syntax error, llama-3.1-8b is the renamed name, you can use a concise name instead of the original complex name, optional
87
- - '@cf/meta/llama-3.1-8b-instruct' # The model name must be enclosed in quotes, otherwise yaml syntax error
88
 
89
  - provider: other-provider
90
  base_url: https://api.xxx.com/v1/messages
@@ -93,11 +114,11 @@ providers:
93
  - causallm-35b-beta2ep-q6k: causallm-35b
94
  - anthropic/claude-3-5-sonnet
95
  tools: false
96
- engine: openrouter # Force to use a specific message format, currently supports gpt, claude, gemini, openrouter native format, optional
97
 
98
  api_keys:
99
- - api: sk-KjjI60Yf0JFWxfgRmXqFWyGtWUd9GZnmi3KlvowmRWpWpQRo # API Key, users need an API key to use this service, required
100
- model: # Models that this API Key can use, required
101
  - gpt-4o # Usable model name, can use all gpt-4o models provided by providers
102
  - claude-3-5-sonnet # Usable model name, can use all claude-3-5-sonnet models provided by providers
103
  - gemini/* # Usable model name, can only use all models provided by the provider named gemini, where gemini is the provider name, * represents all models
@@ -105,68 +126,76 @@ api_keys:
105
 
106
  - api: sk-pkhf60Yf0JGyJxgRmXqFQyTgWUd9GZnmi3KlvowmRWpWqrhy
107
  model:
108
- - anthropic/claude-3-5-sonnet # Usable model name, can only use the claude-3-5-sonnet model provided by the provider named anthropic. Other providers' claude-3-5-sonnet models cannot be used. This way of writing will not match the model named anthropic/claude-3-5-sonnet provided by other-provider.
109
- - <anthropic/claude-3-5-sonnet> # By adding angle brackets on both sides of the model name, it will not look for the claude-3-5-sonnet model under the channel named anthropic, but will treat the entire anthropic/claude-3-5-sonnet as the model name. This way of writing can match the model named anthropic/claude-3-5-sonnet provided by other-provider. But it will not match the claude-3-5-sonnet model under anthropic.
110
- - openai-test/text-moderation-latest # When message moderation is enabled, you can use the text-moderation-latest model under the channel named openai-test for moderation.
111
  preferences:
112
- USE_ROUND_ROBIN: true # Whether to use polling load balancing, true to use, false to not use, default is true. When polling is enabled, each request will be made in the order configured in the model. It is not related to the original channel order in providers. Therefore, you can set different request orders for each API key.
113
- AUTO_RETRY: true # Whether to automatically retry, automatically retry the next provider, true to automatically retry, false to not automatically retry, default is true
114
- RATE_LIMIT: 2/min # Supports rate limiting, the maximum number of requests per minute, can be set to an integer, such as 2/min, 2 times per minute, 5/hour, 5 times per hour, 10/day, 10 times per day, 10/month, 10 times per month, 10/year, 10 times per year. Default is 60/min, optional
115
- ENABLE_MODERATION: true # Whether to enable message moderation, true to enable, false to not enable, default is false. When enabled, it will conduct moderation on the user's message, if inappropriate messages are found, it will return an error message.
 
116
 
117
  # Channel-level weighted load balancing configuration example
118
  - api: sk-KjjI60Yd0JFWtxxxxxxxxxxxxxxwmRWpWpQRo
119
  model:
120
- - gcp1/*: 5 # The number after the colon is the weight, the weight only supports positive integers.
121
  - gcp2/*: 3 # The larger the number, the greater the probability of the request.
122
- - gcp3/*: 2 # In this example, there are a total of 10 weights for all channels, and 5 out of 10 requests will request the gcp1/* model, 2 requests will request the gcp2/* model, and 3 requests will request the gcp3/* model.
123
 
124
  preferences:
125
- USE_ROUND_ROBIN: true # When USE_ROUND_ROBIN must be true and there is no weight after the above channels, it will request in the original channel order, if there is weight, it will request in the weighted order.
126
  AUTO_RETRY: true
127
  ```
128
 
129
- If you do not want to set available channels for each `api` one by one in `api_keys`, `uni-api` supports setting the `api key` to be able to use all models. The configuration is as follows:
130
 
131
- ```yaml
132
- # ... providers configuration unchanged ...
133
- api_keys:
134
- - api: sk-LjjI60Yf0JFWxfgRmXqFWyGtWUd9GZnmi3KlvowmRWpWpQRo # API Key, users need an API key to request uni-api, required
135
- model: # The model that can be used with this API Key, required
136
- - all # Can use all models in all channels set under providers, no need to add available channels one by one.
137
- # ... other configurations unchanged ...
 
 
 
 
 
 
 
138
  ```
139
 
140
- ## Environment Variables
141
 
142
- - CONFIG_URL: The download address of the configuration file, it can be a local file or a remote file, optional
143
- - TIMEOUT: Request timeout, default is 100 seconds, the timeout can control the time needed to switch to the next channel when a channel does not respond. Optional
144
 
145
- ## Retrieve Statistical Data
146
 
147
- Use `/stats` to get usage statistics for each channel over the last 24 hours. Include your own uni-api admin API key.
148
 
149
- The data includes:
150
 
151
- 1. Success rate for each model under each channel, sorted from highest to lowest success rate.
152
- 2. Overall success rate for each channel, sorted from highest to lowest.
153
- 3. Total number of requests for each model across all channels.
154
- 4. Number of requests for each endpoint.
155
- 5. Number of requests from each IP address.
156
 
157
- `/stats?hours=48` The `hours` parameter can control how many hours of recent data statistics are returned. If the `hours` parameter is not provided, it defaults to statistics for the last 24 hours.
158
 
159
- There are other statistical data that you can query yourself by writing SQL in the database. Other data includes: first token time, total processing time for each request, whether each request was successful, whether each request passed ethical review, text content of each request, API key for each request, input token count, and output token count for each request.
160
 
161
- ## Docker Local Deployment
162
 
163
  Start the container
164
 
165
  ```bash
166
  docker run --user root -p 8001:8000 --name uni-api -dit \
167
- -e CONFIG_URL=http://file_url/api.yaml \ # If the local configuration file is already mounted, you do not need to set CONFIG_URL
168
- -v ./api.yaml:/home/api.yaml \ # If CONFIG_URL is already set, you do not need to mount the configuration file
169
- -v ./uniapi_db:/home/data \ # If you do not want to save statistical data, you do not need to mount the stats.db file
170
  yym68686/uni-api:latest
171
  ```
172
 
@@ -178,15 +207,15 @@ services:
178
  container_name: uni-api
179
  image: yym68686/uni-api:latest
180
  environment:
181
- - CONFIG_URL=http://file_url/api.yaml # If the local configuration file is already mounted, there is no need to set CONFIG_URL
182
  ports:
183
  - 8001:8000
184
  volumes:
185
  - ./api.yaml:/home/api.yaml # If CONFIG_URL is already set, there is no need to mount the configuration file
186
- - ./uniapi_db:/home/data # If you do not want to save statistical data, there is no need to mount the stats.db file
187
  ```
188
 
189
- CONFIG_URL is used to automatically download remote configuration files. For example, if it is inconvenient to modify the configuration file on a certain platform, you can upload the configuration file to a hosting service and provide a direct link for uni-api to download. CONFIG_URL is this direct link. If you are using a locally mounted configuration file, you do not need to set CONFIG_URL. CONFIG_URL is used in situations where it is inconvenient to mount the configuration file.
190
 
191
  Run Docker Compose container in the background
192
 
@@ -226,8 +255,7 @@ curl -X POST http://127.0.0.1:8000/v1/chat/completions \
226
  -d '{"model": "gpt-4o","messages": [{"role": "user", "content": "Hello"}],"stream": true}'
227
  ```
228
 
229
-
230
- ## Star History
231
 
232
  <a href="https://github.com/yym68686/uni-api/stargazers">
233
  <img width="500" alt="Star History Chart" src="https://api.star-history.com/svg?repos=yym68686/uni-api&type=Date">
 
13
 
14
  ## Introduction
15
 
16
+ For personal use, one/new-api is too complex with many commercial features that individuals don't need. If you don't want a complicated frontend interface and prefer support for more models, you can try uni-api. This is a project that unifies the management of large language model APIs, allowing you to call multiple backend services through a single unified API interface, converting them all to OpenAI format, and supporting load balancing. Currently supported backend services include: OpenAI, Anthropic, Gemini, Vertex, Cohere, Groq, Cloudflare, DeepBricks, OpenRouter, and more.
17
+
18
+ ## Features
19
+
20
+ - No front-end, pure configuration file to configure API channels. You can run your own API station just by writing a file, and the documentation has a detailed configuration guide, beginner-friendly.
21
+ - Unified management of multiple backend services, supporting providers such as OpenAI, Deepseek, DeepBricks, OpenRouter, and other APIs in OpenAI format. Supports OpenAI Dalle-3 image generation.
22
+ - Simultaneously supports Anthropic, Gemini, Vertex AI, Cohere, Groq, Cloudflare. Vertex simultaneously supports Claude and Gemini API.
23
+ - Support OpenAI, Anthropic, Gemini, Vertex native tool use function calls.
24
+ - Support OpenAI, Anthropic, Gemini, Vertex native image recognition API.
25
+ - Support four types of load balancing.
26
+ 1. Supports channel-level weighted load balancing, allowing requests to be distributed according to different channel weights. It is not enabled by default and requires configuring channel weights.
27
+ 2. Support Vertex regional load balancing and high concurrency, which can increase Gemini and Claude concurrency by up to (number of APIs * number of regions) times. Automatically enabled without additional configuration.
28
+ 3. Except for Vertex region-level load balancing, all APIs support channel-level sequential load balancing, enhancing the immersive translation experience. Automatically enabled without additional configuration.
29
  4. Support automatic API key-level round-robin load balancing for multiple API Keys in a single channel.
30
+ - Support automatic retry, when an API channel response fails, automatically retry the next API channel.
31
+ - Support fine-grained permission control. Support using wildcards to set specific models available for API key channels.
32
+ - Support rate limiting, you can set the maximum number of requests per minute as an integer, such as 2/min, 2 times per minute, 5/hour, 5 times per hour, 10/day, 10 times per day, 10/month, 10 times per month, 10/year, 10 times per year. Default is 60/min.
33
  - Supports multiple standard OpenAI format interfaces: `/v1/chat/completions`, `/v1/images/generations`, `/v1/audio/transcriptions`, `/v1/moderations`, `/v1/models`.
34
+ - Support OpenAI moderation moral review, which can conduct moral reviews of user messages. If inappropriate messages are found, an error message will be returned. This reduces the risk of the backend API being banned by providers.
35
 
36
+ ## Usage method
37
 
38
+ To start uni-api, a configuration file must be used. There are two ways to start with a configuration file:
39
+
40
+ 1. The first method is to use the `CONFIG_URL` environment variable to fill in the configuration file URL, which will be automatically downloaded when uni-api starts.
41
+ 2. The second method is to mount a configuration file named `api.yaml` into the container.
42
+
43
+ ### Method 1: Mount the `api.yaml` configuration file to start uni-api
44
+
45
+ You must fill in the configuration file in advance to start `uni-api`, and you must use a configuration file named `api.yaml` to start `uni-api`, you can configure multiple models, each model can configure multiple backend services, and support load balancing. Below is an example of the minimum `api.yaml` configuration file that can be run:
46
 
47
  ```yaml
48
  providers:
49
+ - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, deepbricks, can be any name, required
50
+ base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
51
+ api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required, automatically uses base_url and api to get all available models through the /v1/models endpoint.
52
+ # Multiple providers can be configured here, each provider can have multiple API Keys, and each API Key can have multiple models configured.
53
+ api_keys:
54
+ - api: sk-Pkj60Yf8JFWxfgRmXQFWyGtWUddGZnmi3KlvowmRWpWpQxx # API Key, required for user requests to uni-api
55
+ model: # Models that can be used by this API Key, required. Channel-level round-robin load balancing is enabled by default, and each request to the model follows the order configured in model. It is independent of the original channel order in providers. Therefore, you can set a different request order for each API key.
56
+ - all # Can use all models from all channels set under providers, no need to add available channels one by one. If you don't want to set available channels for each api in api_keys, uni-api supports setting the api key to use all models from all channels under providers.
57
+ ```
58
+
59
+ Detailed advanced configuration of `api.yaml`:
60
+
61
+ ```yaml
62
+ providers:
63
+ - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, deepbricks, any name, required
64
  base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
65
  api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required
66
+ model: # Optional, if model is not configured, all available models will be automatically retrieved through base_url and api via the /v1/models endpoint.
67
  - gpt-4o # Usable model name, required
68
+ - claude-3-5-sonnet-20240620: claude-3-5-sonnet # Rename model, claude-3-5-sonnet-20240620 is the provider's model name, claude-3-5-sonnet is the renamed name, you can use a simpler name instead of the original complex name, optional
69
  - dall-e-3
70
 
71
  - provider: anthropic
72
  base_url: https://api.anthropic.com/v1/messages
73
+ api: # Supports multiple API Keys, multiple keys automatically enable round-robin load balancing, at least one key, required
74
  - sk-ant-api03-bNnAOJyA-xQw_twAA
75
  - sk-ant-api02-bNnxxxx
76
  model:
77
+ - claude-3-5-sonnet-20240620: claude-3-5-sonnet # Rename model, claude-3-5-sonnet-20240620 is the provider's model name, claude-3-5-sonnet is the renamed name, you can use a simpler name instead of the original complex name, optional
78
+ tools: true # Whether to support tools, such as code generation, document generation, etc., default is true, optional
79
 
80
  - provider: gemini
81
  base_url: https://generativelanguage.googleapis.com/v1beta # base_url supports v1beta/v1, only for Gemini models, required
82
  api: AIzaSyAN2k6IRdgw
83
  model:
84
  - gemini-1.5-pro
85
+ - gemini-1.5-flash-exp-0827: gemini-1.5-flash # After renaming, the original model name gemini-1.5-flash-exp-0827 cannot be used, if you want to use the original name, you can add the original name in the model, just add the following line to use the original name
86
  - gemini-1.5-flash-exp-0827 # Add this line, both gemini-1.5-flash-exp-0827 and gemini-1.5-flash can be requested
87
  tools: true
88
 
89
  - provider: vertex
90
  project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
91
+ private_key: "-----BEGIN PRIVATE KEY-----\nxxxxx\n-----END PRIVATE" # Description: The private key of the Google Cloud Vertex AI service account. Format: A JSON-formatted string containing the private key information of the service account. How to obtain: Create a service account in the Google Cloud Console, generate a JSON-formatted key file, and then set its content as the value of this environment variable.
92
+ client_email: [email protected] # Description: The email address of the Google Cloud Vertex AI service account. Format: Usually a string like "[email protected]". How to obtain: Generated when creating a service account, or can be obtained by viewing service account details in the "IAM & Admin" section of the Google Cloud Console.
93
  model:
94
  - gemini-1.5-pro
95
  - gemini-1.5-flash
 
98
  - claude-3-sonnet@20240229: claude-3-sonnet
99
  - claude-3-haiku@20240307: claude-3-haiku
100
  tools: true
101
+ notes: https://xxxxx.com/ # Can include the provider's website, notes, official documentation, optional
102
 
103
  - provider: cloudflare
104
  api: f42b3xxxxxxxxxxq4aoGAh # Cloudflare API Key, required
105
  cf_account_id: 8ec0xxxxxxxxxxxxe721 # Cloudflare Account ID, required
106
  model:
107
+ - '@cf/meta/llama-3.1-8b-instruct': llama-3.1-8b # Rename model, @cf/meta/llama-3.1-8b-instruct is the provider's original model name, must be enclosed in quotes to avoid YAML syntax error, llama-3.1-8b is the renamed name, you can use a simpler name instead of the original complex name, optional
108
+ - '@cf/meta/llama-3.1-8b-instruct' # Must be enclosed in quotes to avoid YAML syntax error
109
 
110
  - provider: other-provider
111
  base_url: https://api.xxx.com/v1/messages
 
114
  - causallm-35b-beta2ep-q6k: causallm-35b
115
  - anthropic/claude-3-5-sonnet
116
  tools: false
117
+ engine: openrouter # Force use of a specific message format, currently supports gpt, claude, gemini, openrouter native format, optional
118
 
119
  api_keys:
120
+ - api: sk-KjjI60Yf0JFWxfgRmXqFWyGtWUd9GZnmi3KlvowmRWpWpQRo # API Key, required for users to use this service
121
+ model: # The models that this API Key can use, required. Channel-level round-robin load balancing is enabled by default, and each request model is requested in the order configured in the model. It is unrelated to the original channel order in providers. Therefore, you can set different request orders for each API key.
122
  - gpt-4o # Usable model name, can use all gpt-4o models provided by providers
123
  - claude-3-5-sonnet # Usable model name, can use all claude-3-5-sonnet models provided by providers
124
  - gemini/* # Usable model name, can only use all models provided by the provider named gemini, where gemini is the provider name, * represents all models
 
126
 
127
  - api: sk-pkhf60Yf0JGyJxgRmXqFQyTgWUd9GZnmi3KlvowmRWpWqrhy
128
  model:
129
+ - anthropic/claude-3-5-sonnet # Usable model name, can only use the claude-3-5-sonnet model provided by the provider named anthropic. Models named claude-3-5-sonnet from other providers cannot be used. This notation will not match the model named anthropic/claude-3-5-sonnet provided by other-provider.
130
+ - <anthropic/claude-3-5-sonnet> # By adding angle brackets around the model name, it will not look for the claude-3-5-sonnet model under the channel named anthropic, but instead treat the entire anthropic/claude-3-5-sonnet as the model name. This notation can match the model named anthropic/claude-3-5-sonnet provided by other-provider. But it will not match the claude-3-5-sonnet model under anthropic.
131
+ - openai-test/text-moderation-latest # When message moderation is enabled, the text-moderation-latest model under the channel named openai-test can be used for message moderation.
132
  preferences:
133
+ SCHEDULING_ALGORITHM: fixed_priority # When SCHEDULING_ALGORITHM is fixed_priority, fixed priority scheduling is used, always executing the channel of the first model with a request. Modify the default channel round-robin load balancing. SCHEDULING_ALGORITHM options are: fixed_priority, weighted_round_robin, lottery, random.
134
+ # When SCHEDULING_ALGORITHM is random, random round-robin load balancing is used, randomly requesting the channel of the model with a request.
135
+ AUTO_RETRY: true # Whether to automatically retry, automatically retry the next provider, true for automatic retry, false for no automatic retry, default is true
136
+ RATE_LIMIT: 2/min # Supports rate limiting, the maximum number of requests per minute, can be set as an integer, such as 2/min, 2 times per minute, 5/hour, 5 times per hour, 10/day, 10 times per day, 10/month, 10 times per month, 10/year, 10 times per year. Default is 60/min, optional
137
+ ENABLE_MODERATION: true # Whether to enable message moderation, true to enable, false to disable, default is false, when enabled, user messages will be moderated, and if inappropriate messages are found, an error message will be returned.
138
 
139
  # Channel-level weighted load balancing configuration example
140
  - api: sk-KjjI60Yd0JFWtxxxxxxxxxxxxxxwmRWpWpQRo
141
  model:
142
+ - gcp1/*: 5 # The number after the colon is the weight, weight only supports positive integers.
143
  - gcp2/*: 3 # The larger the number, the greater the probability of the request.
144
+ - gcp3/*: 2 # In this example, there are a total of 10 weights across all channels, and out of 10 requests, 5 requests will request the gcp1/* model, 2 requests will request the gcp2/* model, and 3 requests will request the gcp3/* model.
145
 
146
  preferences:
147
+ SCHEDULING_ALGORITHM: weighted_round_robin # Only when SCHEDULING_ALGORITHM is weighted_round_robin and if the above channels have weights, requests will be made according to the weighted order. Use weighted round-robin load balancing, request the channel of the model with a request according to the weight order. When SCHEDULING_ALGORITHM is lottery, use lottery round-robin load balancing, request the channel of the model with a request according to the weight randomly.
148
  AUTO_RETRY: true
149
  ```
150
 
151
+ Mount the configuration file and start the uni-api docker container:
152
 
153
+ ```bash
154
+ docker run --user root -p 8001:8000 --name uni-api -dit \
155
+ -v ./api.yaml:/home/api.yaml \
156
+ yym68686/uni-api:latest
157
+ ```
158
+
159
+ ### Method two: Start uni-api using the `CONFIG_URL` environment variable
160
+
161
+ After writing the configuration file according to method one, upload it to the cloud disk, get the file's direct link, and then use the `CONFIG_URL` environment variable to start the uni-api docker container:
162
+
163
+ ```bash
164
+ docker run --user root -p 8001:8000 --name uni-api -dit \
165
+ -e CONFIG_URL=http://file_url/api.yaml \
166
+ yym68686/uni-api:latest
167
  ```
168
 
169
+ ## Environment variable
170
 
171
+ - CONFIG_URL: The download address of the configuration file, which can be a local file or a remote file, optional
172
+ - TIMEOUT: Request timeout, default is 100 seconds. The timeout can control the time needed to switch to the next channel when one channel does not respond. Optional
173
 
174
+ ## Get statistical data
175
 
176
+ Use `/stats` to get the usage statistics of each channel for the past 24 hours. Also include your uni-api admin API key.
177
 
178
+ Data includes:
179
 
180
+ 1. The success rate of each model under each channel, sorted from high to low.
181
+ 2. The overall success rate of each channel, sorted from high to low.
182
+ 3. The total number of requests for each model across all channels.
183
+ 4. The number of requests for each endpoint.
184
+ 5. The number of requests per IP.
185
 
186
+ The `hours` parameter in `/stats?hours=48` allows you to control how many hours of recent data statistics to return. If the `hours` parameter is not provided, it defaults to statistics for the last 24 hours.
187
 
188
+ There are other statistical data that you can query yourself by writing SQL in the database. Other data includes: first token time, total processing time for each request, whether each request was successful, whether each request passed content moderation, the text content of each request, the API key for each request, the number of input tokens, and the number of output tokens for each request.
189
 
190
+ ## Docker local deployment
191
 
192
  Start the container
193
 
194
  ```bash
195
  docker run --user root -p 8001:8000 --name uni-api -dit \
196
+ -e CONFIG_URL=http://file_url/api.yaml \ # If the local configuration file has already been mounted, there is no need to set CONFIG_URL
197
+ -v ./api.yaml:/home/api.yaml \ # If CONFIG_URL is already set, there is no need to mount the configuration file
198
+ -v ./uniapi_db:/home/data \ # If you do not want to save statistical data, there is no need to mount this folder
199
  yym68686/uni-api:latest
200
  ```
201
 
 
207
  container_name: uni-api
208
  image: yym68686/uni-api:latest
209
  environment:
210
+ - CONFIG_URL=http://file_url/api.yaml # If a local configuration file is already mounted, there is no need to set CONFIG_URL
211
  ports:
212
  - 8001:8000
213
  volumes:
214
  - ./api.yaml:/home/api.yaml # If CONFIG_URL is already set, there is no need to mount the configuration file
215
+ - ./uniapi_db:/home/data # If you do not want to save statistical data, there is no need to mount this folder
216
  ```
217
 
218
+ CONFIG_URL is the URL of the remote configuration file that can be automatically downloaded. For example, if you are not comfortable modifying the configuration file on a certain platform, you can upload the configuration file to a hosting service and provide a direct link to uni-api to download, which is the CONFIG_URL. If you are using a local mounted configuration file, there is no need to set CONFIG_URL. CONFIG_URL is used when it is not convenient to mount the configuration file.
219
 
220
  Run Docker Compose container in the background
221
 
 
255
  -d '{"model": "gpt-4o","messages": [{"role": "user", "content": "Hello"}],"stream": true}'
256
  ```
257
 
258
+ ## ⭐ Star History
 
259
 
260
  <a href="https://github.com/yym68686/uni-api/stargazers">
261
  <img width="500" alt="Star History Chart" src="https://api.star-history.com/svg?repos=yym68686/uni-api&type=Date">
README_CN.md CHANGED
@@ -11,11 +11,11 @@
11
 
12
  [英文](./README.md) | [中文](./README_CN.md)
13
 
14
- ## Introduction
15
 
16
  如果个人使用的话,one/new-api 过于复杂,有很多个人不需要使用的商用功能,如果你不想要复杂的前端界面,有想要支持的模型多一点,可以试试 uni-api。这是一个统一管理大模型API的项目,可以通过一个统一的API接口调用多个后端服务,统一转换为 OpenAI 格式,支持负载均衡。目前支持的后端服务有:OpenAI、Anthropic、Gemini、Vertex、Cohere、Groq、Cloudflare、DeepBricks、OpenRouter 等。
17
 
18
- ## Features
19
 
20
  - 无前端,纯配置文件配置 API 渠道。只要写一个文件就能运行起一个属于自己的 API 站,文档有详细的配置指南,小白友好。
21
  - 统一管理多个后端服务,支持 OpenAI、Deepseek、DeepBricks、OpenRouter 等其他 API 是 OpenAI 格式的提供商。支持 OpenAI Dalle-3 图像生成。
@@ -33,16 +33,37 @@
33
  - 支持多个标准 OpenAI 格式的接口:`/v1/chat/completions`,`/v1/images/generations`,`/v1/audio/transcriptions`,`/v1/moderations`,`/v1/models`。
34
  - 支持 OpenAI moderation 道德审查,可以对用户的消息进行道德审查,如果发现不当的消息,会返回错误信息。降低后台 API 被提供商封禁的风险。
35
 
36
- ## Configuration
37
 
38
- 使用 api.yaml 配置文件,可以配置多个模型,每个模型可以配置多个后端服务,支持负载均衡。下面是 api.yaml 配置文件的示例:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ```yaml
41
  providers:
42
  - provider: provider_name # 服务提供商名称, 如 openai、anthropic、gemini、openrouter、deepbricks,随便取名字,必填
43
  base_url: https://api.your.com/v1/chat/completions # 后端服务的API地址,必填
44
  api: sk-YgS6GTi0b4bEabc4C # 提供商的API Key,必填
45
- model: # 至少填一个模型
46
  - gpt-4o # 可以使用的模型名称,必填
47
  - claude-3-5-sonnet-20240620: claude-3-5-sonnet # 重命名模型,claude-3-5-sonnet-20240620 是服务商的模型名称,claude-3-5-sonnet 是重命名后的名字,可以使用简洁的名字代替原来复杂的名称,选填
48
  - dall-e-3
@@ -97,7 +118,7 @@ providers:
97
 
98
  api_keys:
99
  - api: sk-KjjI60Yf0JFWxfgRmXqFWyGtWUd9GZnmi3KlvowmRWpWpQRo # API Key,用户使用本服务需要 API key,必填
100
- model: # 该 API Key 可以使用的模型,必填
101
  - gpt-4o # 可以使用的模型名称,可以使用所有提供商提供的 gpt-4o 模型
102
  - claude-3-5-sonnet # 可以使用的模型名称,可以使用所有提供商提供的 claude-3-5-sonnet 模型
103
  - gemini/* # 可以使用的模型名称,仅可以使用名为 gemini 提供商提供的所有模型,其中 gemini 是 provider ���称,* 代表所有模型
@@ -109,7 +130,8 @@ api_keys:
109
  - <anthropic/claude-3-5-sonnet> # 通过在模型名两侧加上尖括号,这样就不会去名为anthropic的渠道下去寻找claude-3-5-sonnet模型,而是将整个 anthropic/claude-3-5-sonnet 作为模型名称。这种写法可以匹配到other-provider提供的名为 anthropic/claude-3-5-sonnet 的模型。但不会匹配到anthropic下面的claude-3-5-sonnet模型。
110
  - openai-test/text-moderation-latest # 当开启消息道德审查后,可以使用名为 openai-test 渠道下的 text-moderation-latest 模型进行道德审查。
111
  preferences:
112
- USE_ROUND_ROBIN: true # 是否使用轮询负载均衡,true 为使用,false 为不使用,默认为 true。开启轮训后每次请求模型按照 model 配置的顺序依次请求。与 providers 里面原始的渠道顺序无关。因此你可以设置每个 API key 请求顺序不一样。
 
113
  AUTO_RETRY: true # 是否自动重试,自动重试下一个提供商,true 为自动重试,false 为不自动重试,默认为 true
114
  RATE_LIMIT: 2/min # 支持限流,每分钟最多请求次数,可以设置为整数,如 2/min,2 次每分钟、5/hour,5 次每小时、10/day,10 次每天,10/month,10 次每月,10/year,10 次每年。默认60/min,选填
115
  ENABLE_MODERATION: true # 是否开启消息道德审查,true 为开启,false 为不开启,默认为 false,当开启后,会对用户的消息进行道德审查,如果发现不当的消息,会返回错误信息。
@@ -122,19 +144,26 @@ api_keys:
122
  - gcp3/*: 2 # 在该示例中,所有渠道加起来一共有 10 个权重,及 10 个请求里面有 5 个请求会请求 gcp1/* 模型,2 个请求会请求 gcp2/* 模型,3 个请求会请求 gcp3/* 模型。
123
 
124
  preferences:
125
- USE_ROUND_ROBIN: true # USE_ROUND_ROBIN 必须为 true 并且上面的渠道后面没有权重时,会按照原始的渠道顺序请求,如果有权重,会按照加权后的顺序请求。
126
  AUTO_RETRY: true
127
  ```
128
 
129
- 如果你不想在 `api_keys` 里面给每个 `api` 一个个设置可用渠道,`uni-api` 支持将 `api key` 设置为可以使用所有模型,配置如下:
130
 
131
- ```yaml
132
- # ... providers 配置不变 ...
133
- api_keys:
134
- - api: sk-LjjI60Yf0JFWxfgRmXqFWyGtWUd9GZnmi3KlvowmRWpWpQRo # API Key,用户请求 uni-api 需要 API key,必填
135
- model: # 该 API Key 可以使用的模型,必填
136
- - all # 可以使用 providers 下面设置的所有渠道里面的所有模型,不需要一个个添加可用渠道。
137
- # ... 其他配置不变 ...
 
 
 
 
 
 
 
138
  ```
139
 
140
  ## 环境变量
@@ -158,7 +187,7 @@ api_keys:
158
 
159
  还有其他统计数据,可以自己写sql在数据库自己查。其他数据包括:首字时间,每个请求的总处理时间,每次请求是否成功,每次请求是否符合道德审查,每次请求的文本内容,每次请求的 API key,每次请求的输入 token,输出 token 数量。
160
 
161
- ## Docker Local Deployment
162
 
163
  Start the container
164
 
@@ -226,8 +255,7 @@ curl -X POST http://127.0.0.1:8000/v1/chat/completions \
226
  -d '{"model": "gpt-4o","messages": [{"role": "user", "content": "Hello"}],"stream": true}'
227
  ```
228
 
229
-
230
- ## Star History
231
 
232
  <a href="https://github.com/yym68686/uni-api/stargazers">
233
  <img width="500" alt="Star History Chart" src="https://api.star-history.com/svg?repos=yym68686/uni-api&type=Date">
 
11
 
12
  [英文](./README.md) | [中文](./README_CN.md)
13
 
14
+ ## 介绍
15
 
16
  如果个人使用的话,one/new-api 过于复杂,有很多个人不需要使用的商用功能,如果你不想要复杂的前端界面,有想要支持的模型多一点,可以试试 uni-api。这是一个统一管理大模型API的项目,可以通过一个统一的API接口调用多个后端服务,统一转换为 OpenAI 格式,支持负载均衡。目前支持的后端服务有:OpenAI、Anthropic、Gemini、Vertex、Cohere、Groq、Cloudflare、DeepBricks、OpenRouter 等。
17
 
18
+ ## ✨ 特性
19
 
20
  - 无前端,纯配置文件配置 API 渠道。只要写一个文件就能运行起一个属于自己的 API 站,文档有详细的配置指南,小白友好。
21
  - 统一管理多个后端服务,支持 OpenAI、Deepseek、DeepBricks、OpenRouter 等其他 API 是 OpenAI 格式的提供商。支持 OpenAI Dalle-3 图像生成。
 
33
  - 支持多个标准 OpenAI 格式的接口:`/v1/chat/completions`,`/v1/images/generations`,`/v1/audio/transcriptions`,`/v1/moderations`,`/v1/models`。
34
  - 支持 OpenAI moderation 道德审查,可以对用户的消息进行道德审查,如果发现不当的消息,会返回错误信息。降低后台 API 被提供商封禁的风险。
35
 
36
+ ## 使用方法
37
 
38
+ 启动 uni-api 必须使用配置文件,有两种方式可以启动配置文件:
39
+
40
+ 1. 第一种是使用 `CONFIG_URL` 环境变量填写配置文件 URL,uni-api启动时会自动下载。
41
+ 2. 第二种就是挂载名为 `api.yaml` 的配置文件到容器内。
42
+
43
+ ### 方法一:挂载 `api.yaml` 配置文件启动 uni-api
44
+
45
+ 必须事先填写完成配置文件才能启动 `uni-api`,必须使用名为 `api.yaml` 的配置文件才能启动 `uni-api`,可以配置多个模型,每个模型可以配置多个后端服务,支持负载均衡。下面是最小可运行的 `api.yaml` 配置文件的示例:
46
+
47
+ ```yaml
48
+ providers:
49
+ - provider: provider_name # 服务提供商名称, 如 openai、anthropic、gemini、openrouter、deepbricks,随便取名字,必填
50
+ base_url: https://api.your.com/v1/chat/completions # 后端服务的API地址,必填
51
+ api: sk-YgS6GTi0b4bEabc4C # 提供商的API Key,必填,自动使用 base_url 和 api 通过 /v1/models 端点获取可用的所有模型。
52
+ # 这里可以配置多个提供商,每个提供商可以配置多个 API Key,每个 API Key 可以配置多个模型。
53
+ api_keys:
54
+ - api: sk-Pkj60Yf8JFWxfgRmXQFWyGtWUddGZnmi3KlvowmRWpWpQxx # API Key,用户请求 uni-api 需要 API key,必填
55
+ model: # 该 API Key 可以使用的模型,必填。默认开启渠道级轮询负载均衡,每次请求模型按照 model 配置的顺序依次请求。与 providers 里面原始的渠道顺序无关。因此你可以设置每个 API key 请求顺序不一样。
56
+ - all # 可以使用 providers 下面设置的所有渠道里面的所有模型,不需要一个个添加可用渠道。如果你不想在 `api_keys` 里面给每个 `api` 一个个设置可用渠道,`uni-api` 支持将 `api key` 设置为可以使用 providers 下面所有渠道的所有模型。
57
+ ```
58
+
59
+ `api.yaml` 详细的高级配置:
60
 
61
  ```yaml
62
  providers:
63
  - provider: provider_name # 服务提供商名称, 如 openai、anthropic、gemini、openrouter、deepbricks,随便取名字,必填
64
  base_url: https://api.your.com/v1/chat/completions # 后端服务的API地址,必填
65
  api: sk-YgS6GTi0b4bEabc4C # 提供商的API Key,必填
66
+ model: # 选填,如果不配置 model,会自动通过 base_url 和 api 通过 /v1/models 端点获取可用的所有模型。
67
  - gpt-4o # 可以使用的模型名称,必填
68
  - claude-3-5-sonnet-20240620: claude-3-5-sonnet # 重命名模型,claude-3-5-sonnet-20240620 是服务商的模型名称,claude-3-5-sonnet 是重命名后的名字,可以使用简洁的名字代替原来复杂的名称,选填
69
  - dall-e-3
 
118
 
119
  api_keys:
120
  - api: sk-KjjI60Yf0JFWxfgRmXqFWyGtWUd9GZnmi3KlvowmRWpWpQRo # API Key,用户使用本服务需要 API key,必填
121
+ model: # 该 API Key 可以使用的模型,必填。默认开启渠道级轮询负载均衡,每次请求模型按照 model 配置的顺序依次请求。与 providers 里面原始的渠道顺序无关。因此你可以设置每个 API key 请求顺序不一样。
122
  - gpt-4o # 可以使用的模型名称,可以使用所有提供商提供的 gpt-4o 模型
123
  - claude-3-5-sonnet # 可以使用的模型名称,可以使用所有提供商提供的 claude-3-5-sonnet 模型
124
  - gemini/* # 可以使用的模型名称,仅可以使用名为 gemini 提供商提供的所有模型,其中 gemini 是 provider ���称,* 代表所有模型
 
130
  - <anthropic/claude-3-5-sonnet> # 通过在模型名两侧加上尖括号,这样就不会去名为anthropic的渠道下去寻找claude-3-5-sonnet模型,而是将整个 anthropic/claude-3-5-sonnet 作为模型名称。这种写法可以匹配到other-provider提供的名为 anthropic/claude-3-5-sonnet 的模型。但不会匹配到anthropic下面的claude-3-5-sonnet模型。
131
  - openai-test/text-moderation-latest # 当开启消息道德审查后,可以使用名为 openai-test 渠道下的 text-moderation-latest 模型进行道德审查。
132
  preferences:
133
+ SCHEDULING_ALGORITHM: fixed_priority # SCHEDULING_ALGORITHM fixed_priority 时,使用固定优先级调度,永远执行第一个拥有请求的模型的渠道。修改默认开启的渠道轮询负载均衡。SCHEDULING_ALGORITHM 可选值为:fixed_priority,weighted_round_robin, lottery, random。
134
+ # 当 SCHEDULING_ALGORITHM 为 random 时,使用随机轮训负载均衡,随机请求拥有请求的模型的渠道。
135
  AUTO_RETRY: true # 是否自动重试,自动重试下一个提供商,true 为自动重试,false 为不自动重试,默认为 true
136
  RATE_LIMIT: 2/min # 支持限流,每分钟最多请求次数,可以设置为整数,如 2/min,2 次每分钟、5/hour,5 次每小时、10/day,10 次每天,10/month,10 次每月,10/year,10 次每年。默认60/min,选填
137
  ENABLE_MODERATION: true # 是否开启消息道德审查,true 为开启,false 为不开启,默认为 false,当开启后,会对用户的消息进行道德审查,如果发现不当的消息,会返回错误信息。
 
144
  - gcp3/*: 2 # 在该示例中,所有渠道加起来一共有 10 个权重,及 10 个请求里面有 5 个请求会请求 gcp1/* 模型,2 个请求会请求 gcp2/* 模型,3 个请求会请求 gcp3/* 模型。
145
 
146
  preferences:
147
+ SCHEDULING_ALGORITHM: weighted_round_robin # 仅当 SCHEDULING_ALGORITHM weighted_round_robin 并且上面的渠道如果有权重,会按照加权后的顺序请求。使用加权轮训负载均衡,按照权重顺序请求拥有请求的模型的渠道。当 SCHEDULING_ALGORITHM 为 lottery 时,使用抽奖轮训负载均衡,按照权重随机请求拥有请求的模型的渠道。
148
  AUTO_RETRY: true
149
  ```
150
 
151
+ 挂载配置文件并启动 uni-api docker 容器:
152
 
153
+ ```bash
154
+ docker run --user root -p 8001:8000 --name uni-api -dit \
155
+ -v ./api.yaml:/home/api.yaml \
156
+ yym68686/uni-api:latest
157
+ ```
158
+
159
+ ### 方法二:使用 `CONFIG_URL` 环境变量启动 uni-api
160
+
161
+ 按照方法一写完配置文件后,上传到云端硬盘,获取文件的直链,然后使用 `CONFIG_URL` 环境变量启动 uni-api docker 容器:
162
+
163
+ ```bash
164
+ docker run --user root -p 8001:8000 --name uni-api -dit \
165
+ -e CONFIG_URL=http://file_url/api.yaml \
166
+ yym68686/uni-api:latest
167
  ```
168
 
169
  ## 环境变量
 
187
 
188
  还有其他统计数据,可以自己写sql在数据库自己查。其他数据包括:首字时间,每个请求的总处理时间,每次请求是否成功,每次请求是否符合道德审查,每次请求的文本内容,每次请求的 API key,每次请求的输入 token,输出 token 数量。
189
 
190
+ ## Docker 本地部署
191
 
192
  Start the container
193
 
 
255
  -d '{"model": "gpt-4o","messages": [{"role": "user", "content": "Hello"}],"stream": true}'
256
  ```
257
 
258
+ ## ⭐ Star History
 
259
 
260
  <a href="https://github.com/yym68686/uni-api/stargazers">
261
  <img width="500" alt="Star History Chart" src="https://api.star-history.com/svg?repos=yym68686/uni-api&type=Date">
main.py CHANGED
@@ -128,7 +128,6 @@ async def http_exception_handler(request: Request, exc: HTTPException):
128
  )
129
 
130
  import uuid
131
- import json
132
  import asyncio
133
  import contextvars
134
  request_info = contextvars.ContextVar('request_info', default={})
@@ -602,6 +601,21 @@ def weighted_round_robin(weights):
602
 
603
  return weighted_provider_list
604
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
605
  import asyncio
606
  class ModelRequestHandler:
607
  def __init__(self):
@@ -683,25 +697,25 @@ class ModelRequestHandler:
683
  # model_dict = get_model_dict(provider)
684
  # if model_name in model_dict.keys():
685
  # provider_list.append(provider)
686
- if is_debug:
687
- for provider in provider_list:
688
- logger.info("available provider: %s", json.dumps(provider, indent=4, ensure_ascii=False, default=circular_list_encoder))
689
  return provider_list
690
 
691
  async def request_model(self, request: Union[RequestModel, ImageGenerationRequest, AudioTranscriptionRequest, ModerationRequest], token: str, endpoint=None):
692
  config = app.state.config
693
- # api_keys_db = app.state.api_keys_db
694
  api_list = app.state.api_list
 
695
 
696
  model_name = request.model
697
  matching_providers = self.get_matching_providers(model_name, token)
698
- # import json
699
- # print("matching_providers", json.dumps(matching_providers, indent=4, ensure_ascii=False))
700
  if not matching_providers:
701
  raise HTTPException(status_code=404, detail="No matching model found")
702
- # exit(0)
703
  # 检查是否启用轮询
704
- api_index = api_list.index(token)
 
 
 
705
  weights = safe_get(config, 'api_keys', api_index, "weights")
706
  if weights:
707
  # 步骤 1: 提取 matching_providers 中的所有 provider 值
@@ -711,7 +725,14 @@ class ModelRequestHandler:
711
  # 步骤 3: 计算交集
712
  intersection = providers.intersection(weight_keys)
713
  weights = dict(filter(lambda item: item[0] in intersection, weights.items()))
714
- weighted_provider_name_list = weighted_round_robin(weights)
 
 
 
 
 
 
 
715
  new_matching_providers = []
716
  for provider_name in weighted_provider_name_list:
717
  for provider in matching_providers:
@@ -719,34 +740,24 @@ class ModelRequestHandler:
719
  new_matching_providers.append(provider)
720
  matching_providers = new_matching_providers
721
 
722
- # import json
723
- # print("matching_providers", json.dumps(matching_providers, indent=4, ensure_ascii=False, default=circular_list_encoder))
724
- use_round_robin = True
725
- auto_retry = True
726
- if safe_get(config, 'api_keys', api_index, "preferences", "USE_ROUND_ROBIN") == False:
727
- use_round_robin = False
728
- if safe_get(config, 'api_keys', api_index, "preferences", "AUTO_RETRY") == False:
729
- auto_retry = False
730
-
731
- return await self.try_all_providers(request, matching_providers, use_round_robin, auto_retry, endpoint, token)
732
 
733
- # 在 try_all_providers 函数中处理失败的情况
734
- async def try_all_providers(self, request: Union[RequestModel, ImageGenerationRequest, AudioTranscriptionRequest, ModerationRequest], providers: List[Dict], use_round_robin: bool, auto_retry: bool, endpoint: str = None, token: str = None):
735
  status_code = 500
736
  error_message = None
737
- num_providers = len(providers)
738
- model_name = request.model
739
 
740
- if use_round_robin:
 
741
  async with self.locks[model_name]:
742
- self.last_provider_indices[model_name] = (self.last_provider_indices[model_name] + 1) % num_providers
743
  start_index = self.last_provider_indices[model_name]
744
- else:
745
- start_index = 0
746
 
747
- for i in range(num_providers + 1):
748
- current_index = (start_index + i) % num_providers
749
- provider = providers[current_index]
 
 
750
  try:
751
  response = await process_request(request, provider, endpoint, token)
752
  return response
 
128
  )
129
 
130
  import uuid
 
131
  import asyncio
132
  import contextvars
133
  request_info = contextvars.ContextVar('request_info', default={})
 
601
 
602
  return weighted_provider_list
603
 
604
+ import random
605
+
606
+ def lottery_scheduling(weights):
607
+ total_tickets = sum(weights.values())
608
+ selections = []
609
+ for _ in range(total_tickets):
610
+ ticket = random.randint(1, total_tickets)
611
+ cumulative = 0
612
+ for provider, weight in weights.items():
613
+ cumulative += weight
614
+ if ticket <= cumulative:
615
+ selections.append(provider)
616
+ break
617
+ return selections
618
+
619
  import asyncio
620
  class ModelRequestHandler:
621
  def __init__(self):
 
697
  # model_dict = get_model_dict(provider)
698
  # if model_name in model_dict.keys():
699
  # provider_list.append(provider)
 
 
 
700
  return provider_list
701
 
702
  async def request_model(self, request: Union[RequestModel, ImageGenerationRequest, AudioTranscriptionRequest, ModerationRequest], token: str, endpoint=None):
703
  config = app.state.config
 
704
  api_list = app.state.api_list
705
+ api_index = api_list.index(token)
706
 
707
  model_name = request.model
708
  matching_providers = self.get_matching_providers(model_name, token)
709
+ num_matching_providers = len(matching_providers)
710
+
711
  if not matching_providers:
712
  raise HTTPException(status_code=404, detail="No matching model found")
713
+
714
  # 检查是否启用轮询
715
+ scheduling_algorithm = safe_get(config, 'api_keys', api_index, "preferences", "SCHEDULING_ALGORITHM", default="fixed_priority")
716
+ if scheduling_algorithm == "random":
717
+ matching_providers = random.sample(matching_providers, num_matching_providers)
718
+
719
  weights = safe_get(config, 'api_keys', api_index, "weights")
720
  if weights:
721
  # 步骤 1: 提取 matching_providers 中的所有 provider 值
 
725
  # 步骤 3: 计算交集
726
  intersection = providers.intersection(weight_keys)
727
  weights = dict(filter(lambda item: item[0] in intersection, weights.items()))
728
+
729
+ if scheduling_algorithm == "weighted_round_robin":
730
+ weighted_provider_name_list = weighted_round_robin(weights)
731
+ elif scheduling_algorithm == "lottery":
732
+ weighted_provider_name_list = lottery_scheduling(weights)
733
+ else:
734
+ weighted_provider_name_list = list(weights.keys())
735
+
736
  new_matching_providers = []
737
  for provider_name in weighted_provider_name_list:
738
  for provider in matching_providers:
 
740
  new_matching_providers.append(provider)
741
  matching_providers = new_matching_providers
742
 
743
+ if is_debug:
744
+ for provider in matching_providers:
745
+ logger.info("available provider: %s", json.dumps(provider, indent=4, ensure_ascii=False, default=circular_list_encoder))
 
 
 
 
 
 
 
746
 
 
 
747
  status_code = 500
748
  error_message = None
 
 
749
 
750
+ start_index = 0
751
+ if scheduling_algorithm != "fixed_priority":
752
  async with self.locks[model_name]:
753
+ self.last_provider_indices[model_name] = (self.last_provider_indices[model_name] + 1) % num_matching_providers
754
  start_index = self.last_provider_indices[model_name]
 
 
755
 
756
+ auto_retry = safe_get(config, 'api_keys', api_index, "preferences", "AUTO_RETRY", default=True)
757
+
758
+ for i in range(num_matching_providers + 1):
759
+ current_index = (start_index + i) % num_matching_providers
760
+ provider = matching_providers[current_index]
761
  try:
762
  response = await process_request(request, provider, endpoint, token)
763
  return response
utils.py CHANGED
@@ -100,7 +100,8 @@ def update_config(config_data):
100
  models.append(key)
101
  if isinstance(model, str):
102
  models.append(model)
103
- config_data['api_keys'][index]['weights'] = weights_dict
 
104
  config_data['api_keys'][index]['model'] = models
105
  api_keys_db[index]['model'] = models
106
 
 
100
  models.append(key)
101
  if isinstance(model, str):
102
  models.append(model)
103
+ if weights_dict:
104
+ config_data['api_keys'][index]['weights'] = weights_dict
105
  config_data['api_keys'][index]['model'] = models
106
  api_keys_db[index]['model'] = models
107