yym68686 commited on
Commit
fc4b826
·
1 Parent(s): c5cf73f

✨ Feature: Add feature: Support channel cooldown. When an API channel response fails, the channel will automatically be excluded and cooled down for a period of time, during which no requests will be made to that channel. After the cooldown period ends, the model will automatically be restored until it fails again, triggering another cooldown.

Browse files
Files changed (3) hide show
  1. README.md +36 -33
  2. README_CN.md +3 -0
  3. main.py +54 -3
README.md CHANGED
@@ -28,6 +28,8 @@ For personal use, one/new-api is too complex with many commercial features that
28
  3. Except for Vertex region-level load balancing, all APIs support channel-level sequential load balancing, enhancing the immersive translation experience. It is not enabled by default and requires configuring `SCHEDULING_ALGORITHM` as `round_robin`.
29
  4. Support automatic API key-level round-robin load balancing for multiple API Keys in a single channel.
30
  - Support automatic retry, when an API channel response fails, automatically retry the next API channel.
 
 
31
  - Support fine-grained permission control. Support using wildcards to set specific models available for API key channels.
32
  - Support rate limiting, you can set the maximum number of requests per minute as an integer, such as 2/min, 2 times per minute, 5/hour, 5 times per hour, 10/day, 10 times per day, 10/month, 10 times per month, 10/year, 10 times per year. Default is 60/min.
33
  - Supports multiple standard OpenAI format interfaces: `/v1/chat/completions`, `/v1/images/generations`, `/v1/audio/transcriptions`, `/v1/moderations`, `/v1/models`.
@@ -59,36 +61,36 @@ Detailed advanced configuration of `api.yaml`:
59
 
60
  ```yaml
61
  providers:
62
- - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, deepbricks, any name is fine, required
63
  base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
64
  api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required
65
- model: # Optional, if the model is not configured, all available models will be automatically obtained through the /v1/models endpoint via base_url and api.
66
  - gpt-4o # Usable model name, required
67
- - claude-3-5-sonnet-20240620: claude-3-5-sonnet # Rename model, claude-3-5-sonnet-20240620 is the provider's model name, claude-3-5-sonnet is the renamed name, a simpler name can replace the original complex name, optional
68
  - dall-e-3
69
 
70
  - provider: anthropic
71
  base_url: https://api.anthropic.com/v1/messages
72
- api: # Supports multiple API Keys, multiple keys automatically enable round-robin load balancing, at least one key, required
73
  - sk-ant-api03-bNnAOJyA-xQw_twAA
74
  - sk-ant-api02-bNnxxxx
75
  model:
76
- - claude-3-5-sonnet-20240620: claude-3-5-sonnet # Rename model, claude-3-5-sonnet-20240620 is the provider's model name, claude-3-5-sonnet is the renamed name, a simpler name can replace the original complex name, optional
77
- tools: true # Whether to support tools, such as code generation, document generation, etc., default is true, optional
78
 
79
  - provider: gemini
80
- base_url: https://generativelanguage.googleapis.com/v1beta # base_url supports v1beta/v1, only for Gemini models, required
81
  api: AIzaSyAN2k6IRdgw
82
  model:
83
  - gemini-1.5-pro
84
- - gemini-1.5-flash-exp-0827: gemini-1.5-flash # After renaming, the original model name gemini-1.5-flash-exp-0827 cannot be used. If you want to use the original name, you can add the original name in the model, just add the following line to use the original name.
85
- - gemini-1.5-flash-exp-0827 # Adding this line allows both gemini-1.5-flash-exp-0827 and gemini-1.5-flash to be requested
86
  tools: true
87
 
88
  - provider: vertex
89
- project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: A string usually consisting of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
90
- private_key: "-----BEGIN PRIVATE KEY-----\nxxxxx\n-----END PRIVATE" # Description: The private key of the Google Cloud Vertex AI service account. Format: A JSON formatted string containing the service account's private key information. How to obtain: Create a service account in the Google Cloud Console, generate a JSON formatted key file, and then set its content as the value of this environment variable.
91
- client_email: [email protected] # Description: The email address of the Google Cloud Vertex AI service account. Format: Usually a string like "[email protected]". How to obtain: Generated when creating the service account, or can be found in the "IAM & Admin" section of the Google Cloud Console to view service account details.
92
  model:
93
  - gemini-1.5-pro
94
  - gemini-1.5-flash
@@ -97,14 +99,14 @@ providers:
97
  - claude-3-sonnet@20240229: claude-3-sonnet
98
  - claude-3-haiku@20240307: claude-3-haiku
99
  tools: true
100
- notes: https://xxxxx.com/ # Can include the provider's website, remarks, official documentation, optional
101
 
102
  - provider: cloudflare
103
  api: f42b3xxxxxxxxxxq4aoGAh # Cloudflare API Key, required
104
  cf_account_id: 8ec0xxxxxxxxxxxxe721 # Cloudflare Account ID, required
105
  model:
106
- - '@cf/meta/llama-3.1-8b-instruct': llama-3.1-8b # Rename model, @cf/meta/llama-3.1-8b-instruct is the provider's original model name, must be enclosed in quotes to avoid YAML syntax error, llama-3.1-8b is the renamed name, a simpler name can replace the original complex name, optional
107
- - '@cf/meta/llama-3.1-8b-instruct' # Must be enclosed in quotes to avoid YAML syntax error
108
 
109
  - provider: other-provider
110
  base_url: https://api.xxx.com/v1/messages
@@ -113,47 +115,48 @@ providers:
113
  - causallm-35b-beta2ep-q6k: causallm-35b
114
  - anthropic/claude-3-5-sonnet
115
  tools: false
116
- engine: openrouter # Force using a specific message format, currently supports gpt, claude, gemini, openrouter native format, optional
117
 
118
  api_keys:
119
- - api: sk-KjjI60Yf0JFWxfgRmXqFWyGtWUd9GZnmi3KlvowmRWpWpQRo # API Key, users need an API key to use this service, required
120
- model: # Models that can be used with this API Key, required. By default, channel-level round-robin load balancing is enabled, and each request is made in the order configured in the model. It is not related to the original channel order in providers. Therefore, you can set different request orders for each API key.
121
  - gpt-4o # Usable model name, can use all gpt-4o models provided by providers
122
  - claude-3-5-sonnet # Usable model name, can use all claude-3-5-sonnet models provided by providers
123
- - gemini/* # Usable model name, can only use all models provided by the provider named gemini, where gemini is the provider name, * represents all models
124
  role: admin
125
 
126
  - api: sk-pkhf60Yf0JGyJxgRmXqFQyTgWUd9GZnmi3KlvowmRWpWqrhy
127
  model:
128
- - anthropic/claude-3-5-sonnet # Usable model name, can only use the claude-3-5-sonnet model provided by the provider named anthropic. Models named claude-3-5-sonnet from other providers cannot be used. This syntax will not match the model named anthropic/claude-3-5-sonnet provided by other-provider.
129
- - <anthropic/claude-3-5-sonnet> # By adding angle brackets around the model name, it will not search for the claude-3-5-sonnet model under the channel named anthropic, but will treat the entire anthropic/claude-3-5-sonnet as the model name. This syntax can match the model named anthropic/claude-3-5-sonnet provided by other-provider. But it will not match the claude-3-5-sonnet model under anthropic.
130
  - openai-test/text-moderation-latest # When message moderation is enabled, the text-moderation-latest model under the channel named openai-test can be used for moderation.
131
  preferences:
132
- SCHEDULING_ALGORITHM: fixed_priority # When SCHEDULING_ALGORITHM is fixed_priority, fixed priority scheduling is used, always executing the channel of the first requested model. Enabled by default, the default value of SCHEDULING_ALGORITHM is fixed_priority. Optional values for SCHEDULING_ALGORITHM are: fixed_priority, round_robin, weighted_round_robin, lottery, random.
133
- # When SCHEDULING_ALGORITHM is random, random round-robin load balancing is used, randomly requesting the channel of the requested model.
134
- # When SCHEDULING_ALGORITHM is round_robin, round-robin load balancing is used, requesting the user's model channels in order.
135
  AUTO_RETRY: true # Whether to automatically retry, automatically retry the next provider, true for automatic retry, false for no automatic retry, default is true
136
- RATE_LIMIT: 2/min # Supports rate limiting, maximum number of requests per minute, can be set to an integer, such as 2/min, 2 times per minute, 5/hour, 5 times per hour, 10/day, 10 times per day, 10/month, 10 times per month, 10/year, 10 times per year. Default 60/min, optional
137
- ENABLE_MODERATION: true # Whether to enable message moderation, true for enable, false for disable, default is false, when enabled, messages will be moderated, and inappropriate messages will return an error.
138
 
139
  # Channel-level weighted load balancing configuration example
140
  - api: sk-KjjI60Yd0JFWtxxxxxxxxxxxxxxwmRWpWpQRo
141
  model:
142
- - gcp1/*: 5 # The number after the colon is the weight, weights only support positive integers.
143
- - gcp2/*: 3 # The size of the number represents the weight, the larger the number, the greater the probability of request.
144
- - gcp3/*: 2 # In this example, there are a total of 10 weights across all channels, and 5 out of 10 requests will request the gcp1/* model, 2 requests will request the gcp2/* model, and 3 requests will request the gcp3/* model.
145
 
146
  preferences:
147
- SCHEDULING_ALGORITHM: weighted_round_robin # Only when SCHEDULING_ALGORITHM is weighted_round_robin and the above channels have weights, requests will be made in the weighted order. Using weighted round-robin load balancing, requests are made in the order of weight for the channel of the requested model. When SCHEDULING_ALGORITHM is lottery, lottery round-robin load balancing is used, randomly requesting the channel of the requested model according to weight. Channels without weights automatically fall back to round_robin round-robin load balancing.
148
  AUTO_RETRY: true
149
 
150
  preferences: # Global configuration
151
  model_timeout: # Model timeout, in seconds, default 100 seconds, optional
152
  gpt-4o: 10 # Model gpt-4o timeout is 10 seconds, gpt-4o is the model name, when requesting models like gpt-4o-2024-08-06, the timeout is also 10 seconds
153
  claude-3-5-sonnet: 10 # Model claude-3-5-sonnet timeout is 10 seconds, when requesting models like claude-3-5-sonnet-20240620, the timeout is also 10 seconds
154
- default: 10 # If the model does not have a timeout set, the default timeout of 10 seconds is used, when requesting models not in model_timeout, the default timeout is 10 seconds, if default is not set, uni-api will use the default timeout set by the environment variable TIMEOUT, which is 100 seconds
155
- o1-mini: 30 # Model o1-mini timeout is 30 seconds, when requesting models with names starting with o1-mini, the timeout is 30 seconds
156
- o1-preview: 100 # Model o1-preview timeout is 100 seconds, when requesting models with names starting with o1-preview, the timeout is 100 seconds
 
157
  ```
158
 
159
  Mount the configuration file and start the uni-api docker container:
 
28
  3. Except for Vertex region-level load balancing, all APIs support channel-level sequential load balancing, enhancing the immersive translation experience. It is not enabled by default and requires configuring `SCHEDULING_ALGORITHM` as `round_robin`.
29
  4. Support automatic API key-level round-robin load balancing for multiple API Keys in a single channel.
30
  - Support automatic retry, when an API channel response fails, automatically retry the next API channel.
31
+ - Support channel cooling: When an API channel response fails, the channel will automatically be excluded and cooled for a period of time, and requests to the channel will be stopped. After the cooling period ends, the model will automatically be restored until it fails again, at which point it will be cooled again.
32
+ - Support fine-grained model timeout settings, allowing different timeout durations for each model.
33
  - Support fine-grained permission control. Support using wildcards to set specific models available for API key channels.
34
  - Support rate limiting, you can set the maximum number of requests per minute as an integer, such as 2/min, 2 times per minute, 5/hour, 5 times per hour, 10/day, 10 times per day, 10/month, 10 times per month, 10/year, 10 times per year. Default is 60/min.
35
  - Supports multiple standard OpenAI format interfaces: `/v1/chat/completions`, `/v1/images/generations`, `/v1/audio/transcriptions`, `/v1/moderations`, `/v1/models`.
 
61
 
62
  ```yaml
63
  providers:
64
+ - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, deepbricks, can be any name, required
65
  base_url: https://api.your.com/v1/chat/completions # Backend service API address, required
66
  api: sk-YgS6GTi0b4bEabc4C # Provider's API Key, required
67
+ model: # Optional, if model is not configured, all available models will be automatically obtained through base_url and api via the /v1/models endpoint.
68
  - gpt-4o # Usable model name, required
69
+ - claude-3-5-sonnet-20240620: claude-3-5-sonnet # Rename model, claude-3-5-sonnet-20240620 is the provider's model name, claude-3-5-sonnet is the renamed name, you can use a simple name to replace the original complex name, optional
70
  - dall-e-3
71
 
72
  - provider: anthropic
73
  base_url: https://api.anthropic.com/v1/messages
74
+ api: # Supports multiple API Keys, multiple keys automatically enable polling load balancing, at least one key, required
75
  - sk-ant-api03-bNnAOJyA-xQw_twAA
76
  - sk-ant-api02-bNnxxxx
77
  model:
78
+ - claude-3-5-sonnet-20240620: claude-3-5-sonnet # Rename model, claude-3-5-sonnet-20240620 is the provider's model name, claude-3-5-sonnet is the renamed name, you can use a simple name to replace the original complex name, optional
79
+ tools: true # Whether to support tools, such as generating code, generating documents, etc., default is true, optional
80
 
81
  - provider: gemini
82
+ base_url: https://generativelanguage.googleapis.com/v1beta # base_url supports v1beta/v1, only for Gemini model use, required
83
  api: AIzaSyAN2k6IRdgw
84
  model:
85
  - gemini-1.5-pro
86
+ - gemini-1.5-flash-exp-0827: gemini-1.5-flash # After renaming, the original model name gemini-1.5-flash-exp-0827 cannot be used, if you want to use the original name, you can add the original name in the model, just add the line below to use the original name
87
+ - gemini-1.5-flash-exp-0827 # Add this line, both gemini-1.5-flash-exp-0827 and gemini-1.5-flash can be requested
88
  tools: true
89
 
90
  - provider: vertex
91
+ project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
92
+ private_key: "-----BEGIN PRIVATE KEY-----\nxxxxx\n-----END PRIVATE" # Description: Private key for Google Cloud Vertex AI service account. Format: A JSON formatted string containing the private key information of the service account. How to obtain: Create a service account in Google Cloud Console, generate a JSON formatted key file, and then set its content as the value of this environment variable.
93
+ client_email: [email protected] # Description: Email address of the Google Cloud Vertex AI service account. Format: Usually a string like "[email protected]". How to obtain: Generated when creating a service account, or you can view the service account details in the "IAM and Admin" section of the Google Cloud Console.
94
  model:
95
  - gemini-1.5-pro
96
  - gemini-1.5-flash
 
99
  - claude-3-sonnet@20240229: claude-3-sonnet
100
  - claude-3-haiku@20240307: claude-3-haiku
101
  tools: true
102
+ notes: https://xxxxx.com/ # You can put the provider's website, notes, official documentation, optional
103
 
104
  - provider: cloudflare
105
  api: f42b3xxxxxxxxxxq4aoGAh # Cloudflare API Key, required
106
  cf_account_id: 8ec0xxxxxxxxxxxxe721 # Cloudflare Account ID, required
107
  model:
108
+ - '@cf/meta/llama-3.1-8b-instruct': llama-3.1-8b # Rename model, @cf/meta/llama-3.1-8b-instruct is the provider's original model name, must be enclosed in quotes, otherwise yaml syntax error, llama-3.1-8b is the renamed name, you can use a simple name to replace the original complex name, optional
109
+ - '@cf/meta/llama-3.1-8b-instruct' # Must be enclosed in quotes, otherwise yaml syntax error
110
 
111
  - provider: other-provider
112
  base_url: https://api.xxx.com/v1/messages
 
115
  - causallm-35b-beta2ep-q6k: causallm-35b
116
  - anthropic/claude-3-5-sonnet
117
  tools: false
118
+ engine: openrouter # Force the use of a specific message format, currently supports gpt, claude, gemini, openrouter native format, optional
119
 
120
  api_keys:
121
+ - api: sk-KjjI60Yf0JFWxfgRmXqFWyGtWUd9GZnmi3KlvowmRWpWpQRo # API Key, required for users to use this service
122
+ model: # Models that can be used by this API Key, required. Default channel-level polling load balancing is enabled, and each request model is requested in sequence according to the model configuration. It is not related to the original channel order in providers. Therefore, you can set different request sequences for each API key.
123
  - gpt-4o # Usable model name, can use all gpt-4o models provided by providers
124
  - claude-3-5-sonnet # Usable model name, can use all claude-3-5-sonnet models provided by providers
125
+ - gemini/* # Usable model name, can only use all models provided by providers named gemini, where gemini is the provider name, * represents all models
126
  role: admin
127
 
128
  - api: sk-pkhf60Yf0JGyJxgRmXqFQyTgWUd9GZnmi3KlvowmRWpWqrhy
129
  model:
130
+ - anthropic/claude-3-5-sonnet # Usable model name, can only use the claude-3-5-sonnet model provided by the provider named anthropic. Models with the same name from other providers cannot be used. This syntax will not match the model named anthropic/claude-3-5-sonnet provided by other-provider.
131
+ - <anthropic/claude-3-5-sonnet> # By adding angle brackets on both sides of the model name, it will not search for the claude-3-5-sonnet model under the channel named anthropic, but will take the entire anthropic/claude-3-5-sonnet as the model name. This syntax can match the model named anthropic/claude-3-5-sonnet provided by other-provider. But it will not match the claude-3-5-sonnet model under anthropic.
132
  - openai-test/text-moderation-latest # When message moderation is enabled, the text-moderation-latest model under the channel named openai-test can be used for moderation.
133
  preferences:
134
+ SCHEDULING_ALGORITHM: fixed_priority # When SCHEDULING_ALGORITHM is fixed_priority, use fixed priority scheduling, always execute the channel of the first model with a request. Default is enabled, SCHEDULING_ALGORITHM default value is fixed_priority. SCHEDULING_ALGORITHM optional values are: fixed_priority, round_robin, weighted_round_robin, lottery, random.
135
+ # When SCHEDULING_ALGORITHM is random, use random polling load balancing, randomly request the channel of the model with a request.
136
+ # When SCHEDULING_ALGORITHM is round_robin, use polling load balancing, request the channel of the model used by the user in order.
137
  AUTO_RETRY: true # Whether to automatically retry, automatically retry the next provider, true for automatic retry, false for no automatic retry, default is true
138
+ RATE_LIMIT: 2/min # Supports rate limiting, maximum number of requests per minute, can be set to an integer, such as 2/min, 2 times per minute, 5/hour, 5 times per hour, 10/day, 10 times per day, 10/month, 10 times per month, 10/year, 10 times per year. Default is 60/min, optional
139
+ ENABLE_MODERATION: true # Whether to enable message moderation, true for enable, false for disable, default is false, when enabled, it will moderate the user's message, if inappropriate messages are found, an error message will be returned.
140
 
141
  # Channel-level weighted load balancing configuration example
142
  - api: sk-KjjI60Yd0JFWtxxxxxxxxxxxxxxwmRWpWpQRo
143
  model:
144
+ - gcp1/*: 5 # The number after the colon is the weight, weight only supports positive integers.
145
+ - gcp2/*: 3 # The size of the number represents the weight, the larger the number, the greater the probability of the request.
146
+ - gcp3/*: 2 # In this example, there are a total of 10 weights for all channels, and 10 requests will have 5 requests for the gcp1/* model, 2 requests for the gcp2/* model, and 3 requests for the gcp3/* model.
147
 
148
  preferences:
149
+ SCHEDULING_ALGORITHM: weighted_round_robin # Only when SCHEDULING_ALGORITHM is weighted_round_robin and the above channel has weights, it will request according to the weighted order. Use weighted polling load balancing, request the channel of the model with a request according to the weight order. When SCHEDULING_ALGORITHM is lottery, use lottery polling load balancing, request the channel of the model with a request according to the weight randomly. Channels without weights automatically fall back to round_robin polling load balancing.
150
  AUTO_RETRY: true
151
 
152
  preferences: # Global configuration
153
  model_timeout: # Model timeout, in seconds, default 100 seconds, optional
154
  gpt-4o: 10 # Model gpt-4o timeout is 10 seconds, gpt-4o is the model name, when requesting models like gpt-4o-2024-08-06, the timeout is also 10 seconds
155
  claude-3-5-sonnet: 10 # Model claude-3-5-sonnet timeout is 10 seconds, when requesting models like claude-3-5-sonnet-20240620, the timeout is also 10 seconds
156
+ default: 10 # Model does not have a timeout set, use the default timeout of 10 seconds, when requesting a model not in model_timeout, the default timeout is 10 seconds, if default is not set, uni-api will use the default timeout set by the environment variable TIMEOUT, the default timeout is 100 seconds
157
+ o1-mini: 30 # Model o1-mini timeout is 30 seconds, when requesting models starting with o1-mini, the timeout is 30 seconds
158
+ o1-preview: 100 # Model o1-preview timeout is 100 seconds, when requesting models starting with o1-preview, the timeout is 100 seconds
159
+ cooldown_period: 300 # Channel cooldown time, in seconds, default 300 seconds, optional. When a model request fails, the channel will be automatically excluded and cooled down for a period of time, and will not request the channel again. After the cooldown time ends, the model will be automatically restored until the request fails again, and it will be cooled down again. When cooldown_period is set to 0, the cooling mechanism is not enabled.
160
  ```
161
 
162
  Mount the configuration file and start the uni-api docker container:
README_CN.md CHANGED
@@ -28,6 +28,8 @@
28
  3. 除了 Vertex 区域级负载均衡,所有 API 均支持渠道级顺序负载均衡,提高沉浸式翻译体验。默认不开启,需要配置 `SCHEDULING_ALGORITHM` 为 `round_robin`。
29
  4. 支持单个渠道多个 API Key 自动开启 API key 级别的轮训负载均衡。
30
  - 支持自动重试,当一个 API 渠道响应失败时,自动重试下一个 API 渠道。
 
 
31
  - 支持细粒度的权限控制。支持使用通配符设置 API key 可用渠道的特定模型。
32
  - 支持限流,可以设置每分钟最多请求次数,可以设置为整数,如 2/min,2 次每分钟、5/hour,5 次每小时、10/day,10 次每天,10/month,10 次每月,10/year,10 次每年。默认60/min。
33
  - 支持多个标准 OpenAI 格式的接口:`/v1/chat/completions`,`/v1/images/generations`,`/v1/audio/transcriptions`,`/v1/moderations`,`/v1/models`。
@@ -154,6 +156,7 @@ preferences: # 全局配置
154
  default: 10 # 模型没有设置超时时间,使用默认的超时时间 10 秒,当请求的不在 model_timeout 里面的模型时,超时时间默认是 10 秒,不设置 default,uni-api 会使用 环境变量 TIMEOUT 设置的默认超时时间,默认超时时间是 100 秒
155
  o1-mini: 30 # 模型 o1-mini 的超时时间为 30 秒,当请求名字是 o1-mini 开头的模型时,超时时间是 30 秒
156
  o1-preview: 100 # 模型 o1-preview 的超时时间为 100 秒,当请求名字是 o1-preview 开头的模型时,超时时间是 100 秒
 
157
  ```
158
 
159
  挂载配置文件并启动 uni-api docker 容器:
 
28
  3. 除了 Vertex 区域级负载均衡,所有 API 均支持渠道级顺序负载均衡,提高沉浸式翻译体验。默认不开启,需要配置 `SCHEDULING_ALGORITHM` 为 `round_robin`。
29
  4. 支持单个渠道多个 API Key 自动开启 API key 级别的轮训负载均衡。
30
  - 支持自动重试,当一个 API 渠道响应失败时,自动重试下一个 API 渠道。
31
+ - 支持渠道冷却,当一个 API 渠道响应失败时,会自动将该渠道排除冷却一段时间,不再请求该渠道,冷却时间结束后,会自动将该模型恢复,直到再次请求失败,会重新冷却。
32
+ - 支持细粒度的模型超时时间设置,可以为每个模型设置不同的超时时间。
33
  - 支持细粒度的权限控制。支持使用通配符设置 API key 可用渠道的特定模型。
34
  - 支持限流,可以设置每分钟最多请求次数,可以设置为整数,如 2/min,2 次每分钟、5/hour,5 次每小时、10/day,10 次每天,10/month,10 次每月,10/year,10 次每年。默认60/min。
35
  - 支持多个标准 OpenAI 格式的接口:`/v1/chat/completions`,`/v1/images/generations`,`/v1/audio/transcriptions`,`/v1/moderations`,`/v1/models`。
 
156
  default: 10 # 模型没有设置超时时间,使用默认的超时时间 10 秒,当请求的不在 model_timeout 里面的模型时,超时时间默认是 10 秒,不设置 default,uni-api 会使用 环境变量 TIMEOUT 设置的默认超时时间,默认超时时间是 100 秒
157
  o1-mini: 30 # 模型 o1-mini 的超时时间为 30 秒,当请求名字是 o1-mini 开头的模型时,超时时间是 30 秒
158
  o1-preview: 100 # 模型 o1-preview 的超时时间为 100 秒,当请求名字是 o1-preview 开头的模型时,超时时间是 100 秒
159
+ cooldown_period: 300 # 渠道冷却时间,单位为秒,默认 300 秒,选填。当模型请求失败时,会自动将该渠道排除冷却一段时间,不再请求该渠道,冷却时间结束后,会自动将该模型恢复,直到再次请求失败,会重新冷却。当 cooldown_period 设置为 0 时,不启用冷却机制。
160
  ```
161
 
162
  挂载配置文件并启动 uni-api docker 容器:
main.py CHANGED
@@ -159,6 +159,39 @@ async def parse_request_body(request: Request):
159
  return None
160
  return None
161
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
162
  from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
163
  from sqlalchemy.orm import declarative_base, sessionmaker
164
  from sqlalchemy import Column, Integer, String, Float, DateTime, select, Boolean, Text
@@ -541,7 +574,7 @@ class ClientManager:
541
 
542
  @app.middleware("http")
543
  async def ensure_config(request: Request, call_next):
544
- if not hasattr(app.state, 'config'):
545
  logger.warning("Config not found, attempting to reload")
546
  app.state.config, app.state.api_keys_db, app.state.api_list = await load_config(app)
547
 
@@ -580,6 +613,14 @@ async def ensure_config(request: Request, call_next):
580
 
581
  print("app.state.timeouts", app.state.timeouts)
582
 
 
 
 
 
 
 
 
 
583
  return await call_next(request)
584
 
585
  # 在 process_request 函数中更新成功和失败计数
@@ -841,11 +882,18 @@ class ModelRequestHandler:
841
 
842
  request_model = request.model
843
  matching_providers = get_matching_providers(request_model, config, api_index)
844
- num_matching_providers = len(matching_providers)
845
 
846
  if not matching_providers:
847
  raise HTTPException(status_code=404, detail="No matching model found")
848
 
 
 
 
 
 
 
 
 
849
  # 检查是否启用轮询
850
  scheduling_algorithm = safe_get(config, 'api_keys', api_index, "preferences", "SCHEDULING_ALGORITHM", default="fixed_priority")
851
  if scheduling_algorithm == "random":
@@ -936,7 +984,10 @@ class ModelRequestHandler:
936
  status_code = 500 # Internal Server Error
937
  error_message = str(e) or f"Unknown error: {e.__class__.__name__}"
938
 
939
- logger.error(f"Error {status_code} with provider {provider['provider']}: {error_message}")
 
 
 
940
  if is_debug:
941
  import traceback
942
  traceback.print_exc()
 
159
  return None
160
  return None
161
 
162
+ class ChannelManager:
163
+ def __init__(self, cooldown_period: int = 300): # 默认冷却时间5分钟
164
+ self._excluded_channels: Dict[str, datetime] = {}
165
+ self._lock = asyncio.Lock()
166
+ self.cooldown_period = cooldown_period
167
+
168
+ async def exclude_channel(self, channel_id: str):
169
+ """将渠道添加到排除列表"""
170
+ async with self._lock:
171
+ self._excluded_channels[channel_id] = datetime.now()
172
+
173
+ async def is_channel_excluded(self, channel_id: str) -> bool:
174
+ """检查渠道是否被排除"""
175
+ async with self._lock:
176
+ if channel_id not in self._excluded_channels:
177
+ return False
178
+
179
+ excluded_time = self._excluded_channels[channel_id]
180
+ if datetime.now() - excluded_time > timedelta(seconds=self.cooldown_period):
181
+ # 已超过冷却时间,移除限制
182
+ del self._excluded_channels[channel_id]
183
+ return False
184
+ return True
185
+
186
+ async def get_available_providers(self, providers: list) -> list:
187
+ """过滤出可用的providers"""
188
+ available_providers = []
189
+ for provider in providers:
190
+ channel_id = f"{provider['provider']}"
191
+ if not await self.is_channel_excluded(channel_id):
192
+ available_providers.append(provider)
193
+ return available_providers
194
+
195
  from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
196
  from sqlalchemy.orm import declarative_base, sessionmaker
197
  from sqlalchemy import Column, Integer, String, Float, DateTime, select, Boolean, Text
 
574
 
575
  @app.middleware("http")
576
  async def ensure_config(request: Request, call_next):
577
+ if app and not hasattr(app.state, 'config'):
578
  logger.warning("Config not found, attempting to reload")
579
  app.state.config, app.state.api_keys_db, app.state.api_list = await load_config(app)
580
 
 
613
 
614
  print("app.state.timeouts", app.state.timeouts)
615
 
616
+ if app and not hasattr(app.state, "channel_manager"):
617
+ if app.state.config and 'preferences' in app.state.config:
618
+ COOLDOWN_PERIOD = app.state.config['preferences'].get('cooldown_period', 300)
619
+ else:
620
+ COOLDOWN_PERIOD = 300
621
+
622
+ app.state.channel_manager = ChannelManager(cooldown_period=COOLDOWN_PERIOD)
623
+
624
  return await call_next(request)
625
 
626
  # 在 process_request 函数中更新成功和失败计数
 
882
 
883
  request_model = request.model
884
  matching_providers = get_matching_providers(request_model, config, api_index)
 
885
 
886
  if not matching_providers:
887
  raise HTTPException(status_code=404, detail="No matching model found")
888
 
889
+ if app.state.channel_manager.cooldown_period > 0:
890
+ matching_providers = await app.state.channel_manager.get_available_providers(matching_providers)
891
+ if not matching_providers:
892
+ raise HTTPException(status_code=503, detail="No available providers at the moment")
893
+
894
+ num_matching_providers = len(matching_providers)
895
+
896
+
897
  # 检查是否启用轮询
898
  scheduling_algorithm = safe_get(config, 'api_keys', api_index, "preferences", "SCHEDULING_ALGORITHM", default="fixed_priority")
899
  if scheduling_algorithm == "random":
 
984
  status_code = 500 # Internal Server Error
985
  error_message = str(e) or f"Unknown error: {e.__class__.__name__}"
986
 
987
+ channel_id = f"{provider['provider']}"
988
+ if app.state.channel_manager.cooldown_period > 0:
989
+ await app.state.channel_manager.exclude_channel(channel_id)
990
+ logger.error(f"Error {status_code} with provider {channel_id}: {error_message}")
991
  if is_debug:
992
  import traceback
993
  traceback.print_exc()