✨ Feature: Add feature: Support setting model timeout at the channel level
Browse files- README.md +8 -0
- README_CN.md +8 -0
- main.py +37 -15
README.md
CHANGED
@@ -97,6 +97,10 @@ providers:
|
|
97 |
# default: 4/min # If the model does not set the frequency limit, use the frequency limit of default
|
98 |
api_key_cooldown_period: 60 # Each API Key will be cooled down for 60 seconds after encountering a 429 error. Optional, the default is 0 seconds. When set to 0, the cooling mechanism is not enabled. When there are multiple API keys, the cooling mechanism will take effect.
|
99 |
api_key_schedule_algorithm: round_robin # Set the request order of multiple API Keys, optional. The default is round_robin, and the optional values are: round_robin, random. It will take effect when there are multiple API keys. round_robin is polling load balancing, and random is random load balancing.
|
|
|
|
|
|
|
|
|
100 |
|
101 |
- provider: vertex
|
102 |
project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
|
@@ -389,6 +393,10 @@ All scheduling algorithms need to be enabled by setting api_keys.(api).preferenc
|
|
389 |
|
390 |
Except for some special channels shown in the advanced configuration, all OpenAI format providers need to fill in the base_url completely, which means the base_url must end with /v1/chat/completions. If you are using GitHub models, the base_url should be filled in as https://models.inference.ai.azure.com/chat/completion, not Azure's URL.
|
391 |
|
|
|
|
|
|
|
|
|
392 |
## ⭐ Star History
|
393 |
|
394 |
<a href="https://github.com/yym68686/uni-api/stargazers">
|
|
|
97 |
# default: 4/min # If the model does not set the frequency limit, use the frequency limit of default
|
98 |
api_key_cooldown_period: 60 # Each API Key will be cooled down for 60 seconds after encountering a 429 error. Optional, the default is 0 seconds. When set to 0, the cooling mechanism is not enabled. When there are multiple API keys, the cooling mechanism will take effect.
|
99 |
api_key_schedule_algorithm: round_robin # Set the request order of multiple API Keys, optional. The default is round_robin, and the optional values are: round_robin, random. It will take effect when there are multiple API keys. round_robin is polling load balancing, and random is random load balancing.
|
100 |
+
model_timeout: # Model timeout, in seconds, default 100 seconds, optional
|
101 |
+
gemini-1.5-pro: 10 # Model gemini-1.5-pro timeout is 10 seconds
|
102 |
+
gemini-1.5-flash: 10 # Model gemini-1.5-flash timeout is 10 seconds
|
103 |
+
default: 10 # Model does not have a timeout set, use the default timeout of 10 seconds, when requesting a model not in model_timeout, the timeout is also 10 seconds, if default is not set, uni-api will use the default timeout set by the environment variable TIMEOUT, the default timeout is 100 seconds
|
104 |
|
105 |
- provider: vertex
|
106 |
project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
|
|
|
393 |
|
394 |
Except for some special channels shown in the advanced configuration, all OpenAI format providers need to fill in the base_url completely, which means the base_url must end with /v1/chat/completions. If you are using GitHub models, the base_url should be filled in as https://models.inference.ai.azure.com/chat/completion, not Azure's URL.
|
395 |
|
396 |
+
- How does the model timeout time work? What is the priority of the channel-level timeout setting and the global model timeout setting?
|
397 |
+
|
398 |
+
The channel-level timeout setting has higher priority than the global model timeout setting. The priority order is: channel-level model timeout setting > channel-level default timeout setting > global model timeout setting > global default timeout setting > environment variable TIMEOUT.
|
399 |
+
|
400 |
## ⭐ Star History
|
401 |
|
402 |
<a href="https://github.com/yym68686/uni-api/stargazers">
|
README_CN.md
CHANGED
@@ -97,6 +97,10 @@ providers:
|
|
97 |
# default: 4/min # 如果模型没有设置频率限制,使用 default 的频率限制
|
98 |
api_key_cooldown_period: 60 # 每个 API Key 遭遇 429 错误后的冷却时间,单位为秒,选填。默认为 0 秒, 当设置为 0 秒时,不启用冷却机制。当存在多个 API key 时才会生效。
|
99 |
api_key_schedule_algorithm: round_robin # 设置多个 API Key 的请求顺序,选填。默认为 round_robin,可选值有:round_robin,random。当存在多个 API key 时才会生效。round_robin 是轮询负载均衡,random 是随机负载均衡。
|
|
|
|
|
|
|
|
|
100 |
|
101 |
- provider: vertex
|
102 |
project_id: gen-lang-client-xxxxxxxxxxxxxx # 描述: 您的Google Cloud项目ID。格式: 字符串,通常由小写字母、数字和连字符组成。获取方式: 在Google Cloud Console的项目选择器中可以找到您的项目ID。
|
@@ -389,6 +393,10 @@ api_keys:
|
|
389 |
|
390 |
除了高级配置里面所展示的一些特殊的渠道,所有 OpenAI 格式的提供商需要把 base_url 填完整,也就是说 base_url 必须以 /v1/chat/completions 结尾。如果你使用的 GitHub models,base_url 应该填写为 https://models.inference.ai.azure.com/chat/completion,而不是 Azure 的 URL。
|
391 |
|
|
|
|
|
|
|
|
|
392 |
## ⭐ Star 历史
|
393 |
|
394 |
<a href="https://github.com/yym68686/uni-api/stargazers">
|
|
|
97 |
# default: 4/min # 如果模型没有设置频率限制,使用 default 的频率限制
|
98 |
api_key_cooldown_period: 60 # 每个 API Key 遭遇 429 错误后的冷却时间,单位为秒,选填。默认为 0 秒, 当设置为 0 秒时,不启用冷却机制。当存在多个 API key 时才会生效。
|
99 |
api_key_schedule_algorithm: round_robin # 设置多个 API Key 的请求顺序,选填。默认为 round_robin,可选值有:round_robin,random。当存在多个 API key 时才会生效。round_robin 是轮询负载均衡,random 是随机负载均衡。
|
100 |
+
model_timeout: # 模型超时时间,单位为秒,默认 100 秒,选填
|
101 |
+
gemini-1.5-pro: 10 # 模型 gemini-1.5-pro 的超时时间为 10 秒
|
102 |
+
gemini-1.5-flash: 10 # 模型 gemini-1.5-flash 的超时时间为 10 秒
|
103 |
+
default: 10 # 模型没有设置超时时间,使用默认的超时时间 10 秒,当请求的不在 model_timeout 里面的模型时,超时时间默认是 10 秒,不设置 default,uni-api 会使用全局配置的模型超时时间。
|
104 |
|
105 |
- provider: vertex
|
106 |
project_id: gen-lang-client-xxxxxxxxxxxxxx # 描述: 您的Google Cloud项目ID。格式: 字符串,通常由小写字母、数字和连字符组成。获取方式: 在Google Cloud Console的项目选择器中可以找到您的项目ID。
|
|
|
393 |
|
394 |
除了高级配置里面所展示的一些特殊的渠道,所有 OpenAI 格式的提供商需要把 base_url 填完整,也就是说 base_url 必须以 /v1/chat/completions 结尾。如果你使用的 GitHub models,base_url 应该填写为 https://models.inference.ai.azure.com/chat/completion,而不是 Azure 的 URL。
|
395 |
|
396 |
+
- 模型超时时间是如何确认的?渠道级别的超时设置和全局模型超时设置的优先级是什么?
|
397 |
+
|
398 |
+
渠道级别的超时设置优先级高于全局模型超时设置。优先级顺序:渠道级别模型超时设置 > 渠道级别默认超时设置 > 全局模型超时设置 > 全局默认超时设置 > 环境变量 TIMEOUT。
|
399 |
+
|
400 |
## ⭐ Star 历史
|
401 |
|
402 |
<a href="https://github.com/yym68686/uni-api/stargazers">
|
main.py
CHANGED
@@ -39,7 +39,7 @@ import os
|
|
39 |
import string
|
40 |
import json
|
41 |
|
42 |
-
DEFAULT_TIMEOUT =
|
43 |
is_debug = bool(os.getenv("DEBUG", False))
|
44 |
# is_debug = False
|
45 |
|
@@ -643,7 +643,22 @@ async def ensure_config(request: Request, call_next):
|
|
643 |
if "default" not in app.state.config['preferences'].get('model_timeout', {}):
|
644 |
app.state.timeouts["default"] = DEFAULT_TIMEOUT
|
645 |
|
646 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
647 |
|
648 |
if app and not hasattr(app.state, "channel_manager"):
|
649 |
if app.state.config and 'preferences' in app.state.config:
|
@@ -655,6 +670,21 @@ async def ensure_config(request: Request, call_next):
|
|
655 |
|
656 |
return await call_next(request)
|
657 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
658 |
# 在 process_request 函数中更新成功和失败计数
|
659 |
async def process_request(request: Union[RequestModel, ImageGenerationRequest, AudioTranscriptionRequest, ModerationRequest, EmbeddingRequest], provider: Dict, endpoint=None):
|
660 |
url = provider['base_url']
|
@@ -729,21 +759,13 @@ async def process_request(request: Union[RequestModel, ImageGenerationRequest, A
|
|
729 |
|
730 |
current_info = request_info.get()
|
731 |
|
732 |
-
|
733 |
-
|
734 |
-
|
735 |
-
|
736 |
-
timeout_value = app.state.timeouts[original_model]
|
737 |
-
else:
|
738 |
-
# 如果没有精确匹配,尝试模糊匹配
|
739 |
-
for timeout_model in app.state.timeouts:
|
740 |
-
if timeout_model in original_model:
|
741 |
-
timeout_value = app.state.timeouts[timeout_model]
|
742 |
-
break
|
743 |
-
|
744 |
-
# 如果都没匹配到,使用默认值
|
745 |
if timeout_value is None:
|
746 |
timeout_value = app.state.timeouts.get("default", DEFAULT_TIMEOUT)
|
|
|
747 |
|
748 |
try:
|
749 |
async with app.state.client_manager.get_client(timeout_value) as client:
|
|
|
39 |
import string
|
40 |
import json
|
41 |
|
42 |
+
DEFAULT_TIMEOUT = int(os.getenv("TIMEOUT", 100))
|
43 |
is_debug = bool(os.getenv("DEBUG", False))
|
44 |
# is_debug = False
|
45 |
|
|
|
643 |
if "default" not in app.state.config['preferences'].get('model_timeout', {}):
|
644 |
app.state.timeouts["default"] = DEFAULT_TIMEOUT
|
645 |
|
646 |
+
app.state.provider_timeouts = defaultdict(lambda: defaultdict(lambda: DEFAULT_TIMEOUT))
|
647 |
+
for provider in app.state.config["providers"]:
|
648 |
+
# print("provider", provider)
|
649 |
+
provider_timeout_settings = safe_get(provider, "preferences", "model_timeout", default={})
|
650 |
+
# print("provider_timeout_settings", provider_timeout_settings)
|
651 |
+
if provider_timeout_settings:
|
652 |
+
for model_name, timeout_value in provider_timeout_settings.items():
|
653 |
+
app.state.provider_timeouts[provider['provider']][model_name] = timeout_value
|
654 |
+
|
655 |
+
# app.state.provider_timeouts["global_time_out"] = app.state.timeouts
|
656 |
+
# provider_timeouts_dict = {
|
657 |
+
# provider: dict(timeouts)
|
658 |
+
# for provider, timeouts in app.state.provider_timeouts.items()
|
659 |
+
# }
|
660 |
+
# print("app.state.provider_timeouts", provider_timeouts_dict)
|
661 |
+
# print("ai" in app.state.provider_timeouts)
|
662 |
|
663 |
if app and not hasattr(app.state, "channel_manager"):
|
664 |
if app.state.config and 'preferences' in app.state.config:
|
|
|
670 |
|
671 |
return await call_next(request)
|
672 |
|
673 |
+
def get_timeout_value(provider_timeouts, original_model):
|
674 |
+
timeout_value = None
|
675 |
+
if original_model in provider_timeouts:
|
676 |
+
timeout_value = provider_timeouts[original_model]
|
677 |
+
else:
|
678 |
+
# 尝试模糊匹配模型
|
679 |
+
for timeout_model in provider_timeouts:
|
680 |
+
if timeout_model != "default" and timeout_model in original_model:
|
681 |
+
timeout_value = provider_timeouts[timeout_model]
|
682 |
+
break
|
683 |
+
else:
|
684 |
+
# 如果模糊匹配失败,使用渠道的默认值
|
685 |
+
timeout_value = provider_timeouts.get("default")
|
686 |
+
return timeout_value
|
687 |
+
|
688 |
# 在 process_request 函数中更新成功和失败计数
|
689 |
async def process_request(request: Union[RequestModel, ImageGenerationRequest, AudioTranscriptionRequest, ModerationRequest, EmbeddingRequest], provider: Dict, endpoint=None):
|
690 |
url = provider['base_url']
|
|
|
759 |
|
760 |
current_info = request_info.get()
|
761 |
|
762 |
+
provider_timeouts = safe_get(app.state.provider_timeouts, channel_id, default=app.state.provider_timeouts["global_time_out"])
|
763 |
+
timeout_value = get_timeout_value(provider_timeouts, original_model)
|
764 |
+
if timeout_value is None:
|
765 |
+
timeout_value = get_timeout_value(app.state.provider_timeouts["global_time_out"], original_model)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
766 |
if timeout_value is None:
|
767 |
timeout_value = app.state.timeouts.get("default", DEFAULT_TIMEOUT)
|
768 |
+
print("timeout_value", timeout_value)
|
769 |
|
770 |
try:
|
771 |
async with app.state.client_manager.get_client(timeout_value) as client:
|