Spaces:

yym68686
/

uni-api

Sleeping

App Files Files Community

yym68686 commited on Nov 7, 2024

Commit

0a213f9

1 Parent(s): 7775638

✨ Feature: Add feature: Support setting model timeout at the channel level

Browse files

Files changed (3) hide show

README.md +8 -0
README_CN.md +8 -0
main.py +37 -15

README.md CHANGED Viewed

@@ -97,6 +97,10 @@ providers:
       #   default: 4/min # If the model does not set the frequency limit, use the frequency limit of default
       api_key_cooldown_period: 60 # Each API Key will be cooled down for 60 seconds after encountering a 429 error. Optional, the default is 0 seconds. When set to 0, the cooling mechanism is not enabled. When there are multiple API keys, the cooling mechanism will take effect.
       api_key_schedule_algorithm: round_robin # Set the request order of multiple API Keys, optional. The default is round_robin, and the optional values are: round_robin, random. It will take effect when there are multiple API keys. round_robin is polling load balancing, and random is random load balancing.
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
@@ -389,6 +393,10 @@ All scheduling algorithms need to be enabled by setting api_keys.(api).preferenc
 Except for some special channels shown in the advanced configuration, all OpenAI format providers need to fill in the base_url completely, which means the base_url must end with /v1/chat/completions. If you are using GitHub models, the base_url should be filled in as https://models.inference.ai.azure.com/chat/completion, not Azure's URL.
 ## ⭐ Star History
 <a href="https://github.com/yym68686/uni-api/stargazers">

       #   default: 4/min # If the model does not set the frequency limit, use the frequency limit of default
       api_key_cooldown_period: 60 # Each API Key will be cooled down for 60 seconds after encountering a 429 error. Optional, the default is 0 seconds. When set to 0, the cooling mechanism is not enabled. When there are multiple API keys, the cooling mechanism will take effect.
       api_key_schedule_algorithm: round_robin # Set the request order of multiple API Keys, optional. The default is round_robin, and the optional values are: round_robin, random. It will take effect when there are multiple API keys. round_robin is polling load balancing, and random is random load balancing.
+      model_timeout: # Model timeout, in seconds, default 100 seconds, optional
+        gemini-1.5-pro: 10 # Model gemini-1.5-pro timeout is 10 seconds
+        gemini-1.5-flash: 10 # Model gemini-1.5-flash timeout is 10 seconds
+        default: 10 # Model does not have a timeout set, use the default timeout of 10 seconds, when requesting a model not in model_timeout, the timeout is also 10 seconds, if default is not set, uni-api will use the default timeout set by the environment variable TIMEOUT, the default timeout is 100 seconds
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
 Except for some special channels shown in the advanced configuration, all OpenAI format providers need to fill in the base_url completely, which means the base_url must end with /v1/chat/completions. If you are using GitHub models, the base_url should be filled in as https://models.inference.ai.azure.com/chat/completion, not Azure's URL.
+- How does the model timeout time work? What is the priority of the channel-level timeout setting and the global model timeout setting?
+The channel-level timeout setting has higher priority than the global model timeout setting. The priority order is: channel-level model timeout setting > channel-level default timeout setting > global model timeout setting > global default timeout setting > environment variable TIMEOUT.
 ## ⭐ Star History
 <a href="https://github.com/yym68686/uni-api/stargazers">

README_CN.md CHANGED Viewed

@@ -97,6 +97,10 @@ providers:
       #   default: 4/min # 如果模型没有设置频率限制，使用 default 的频率限制
       api_key_cooldown_period: 60 # 每个 API Key 遭遇 429 错误后的冷却时间，单位为秒，选填。默认为 0 秒, 当设置为 0 秒时，不启用冷却机制。当存在多个 API key 时才会生效。
       api_key_schedule_algorithm: round_robin # 设置多个 API Key 的请求顺序，选填。默认为 round_robin，可选值有：round_robin，random。当存在多个 API key 时才会生效。round_robin 是轮询负载均衡，random 是随机负载均衡。
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx #    描述： 您的Google Cloud项目ID。格式： 字符串，通常由小写字母、数字和连字符组成。获取方式： 在Google Cloud Console的项目选择器中可以找到您的项目ID。
@@ -389,6 +393,10 @@ api_keys:
 除了高级配置里面所展示的一些特殊的渠道，所有 OpenAI 格式的提供商需要把 base_url 填完整，也就是说 base_url 必须以 /v1/chat/completions 结尾。如果你使用的 GitHub models，base_url 应该填写为 https://models.inference.ai.azure.com/chat/completion，而不是 Azure 的 URL。
 ## ⭐ Star 历史
 <a href="https://github.com/yym68686/uni-api/stargazers">

       #   default: 4/min # 如果模型没有设置频率限制，使用 default 的频率限制
       api_key_cooldown_period: 60 # 每个 API Key 遭遇 429 错误后的冷却时间，单位为秒，选填。默认为 0 秒, 当设置为 0 秒时，不启用冷却机制。当存在多个 API key 时才会生效。
       api_key_schedule_algorithm: round_robin # 设置多个 API Key 的请求顺序，选填。默认为 round_robin，可选值有：round_robin，random。当存在多个 API key 时才会生效。round_robin 是轮询负载均衡，random 是随机负载均衡。
+      model_timeout: # 模型超时时间，单位为秒，默认 100 秒，选填
+        gemini-1.5-pro: 10 # 模型 gemini-1.5-pro 的超时时间为 10 秒
+        gemini-1.5-flash: 10 # 模型 gemini-1.5-flash 的超时时间为 10 秒
+        default: 10 # 模型没有设置超时时间，使用默认的超时时间 10 秒，当请求的不在 model_timeout 里面的模型时，超时时间默认是 10 秒，不设置 default，uni-api 会使用全局配置的模型超时时间。
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx #    描述： 您的Google Cloud项目ID。格式： 字符串，通常由小写字母、数字和连字符组成。获取方式： 在Google Cloud Console的项目选择器中可以找到您的项目ID。
 除了高级配置里面所展示的一些特殊的渠道，所有 OpenAI 格式的提供商需要把 base_url 填完整，也就是说 base_url 必须以 /v1/chat/completions 结尾。如果你使用的 GitHub models，base_url 应该填写为 https://models.inference.ai.azure.com/chat/completion，而不是 Azure 的 URL。
+- 模型超时时间是如何确认的？渠道级别的超时设置和全局模型超时设置的优先级是什么？
+渠道级别的超时设置优先级高于全局模型超时设置。优先级顺序：渠道级别模型超时设置 > 渠道级别默认超时设置 > 全局模型超时设置 > 全局默认超时设置 > 环境变量 TIMEOUT。
 ## ⭐ Star 历史
 <a href="https://github.com/yym68686/uni-api/stargazers">

main.py CHANGED Viewed

@@ -39,7 +39,7 @@ import os
 import string
 import json
-DEFAULT_TIMEOUT = float(os.getenv("TIMEOUT", 100))
 is_debug = bool(os.getenv("DEBUG", False))
 # is_debug = False
@@ -643,7 +643,22 @@ async def ensure_config(request: Request, call_next):
             if "default" not in app.state.config['preferences'].get('model_timeout', {}):
                 app.state.timeouts["default"] = DEFAULT_TIMEOUT
-        # print("app.state.timeouts", app.state.timeouts)
     if app and not hasattr(app.state, "channel_manager"):
         if app.state.config and 'preferences' in app.state.config:
@@ -655,6 +670,21 @@ async def ensure_config(request: Request, call_next):
     return await call_next(request)
 # 在 process_request 函数中更新成功和失败计数
 async def process_request(request: Union[RequestModel, ImageGenerationRequest, AudioTranscriptionRequest, ModerationRequest, EmbeddingRequest], provider: Dict, endpoint=None):
     url = provider['base_url']
@@ -729,21 +759,13 @@ async def process_request(request: Union[RequestModel, ImageGenerationRequest, A
     current_info = request_info.get()
-    timeout_value = None
-    # 先尝试精确匹配
-    if original_model in app.state.timeouts:
-        timeout_value = app.state.timeouts[original_model]
-    else:
-        # 如果没有精确匹配，尝试模糊匹配
-        for timeout_model in app.state.timeouts:
-            if timeout_model in original_model:
-                timeout_value = app.state.timeouts[timeout_model]
-                break
-    # 如果都没匹配到，使用默认值
     if timeout_value is None:
         timeout_value = app.state.timeouts.get("default", DEFAULT_TIMEOUT)
     try:
         async with app.state.client_manager.get_client(timeout_value) as client:

 import string
 import json
+DEFAULT_TIMEOUT = int(os.getenv("TIMEOUT", 100))
 is_debug = bool(os.getenv("DEBUG", False))
 # is_debug = False
             if "default" not in app.state.config['preferences'].get('model_timeout', {}):
                 app.state.timeouts["default"] = DEFAULT_TIMEOUT
+        app.state.provider_timeouts = defaultdict(lambda: defaultdict(lambda: DEFAULT_TIMEOUT))
+        for provider in app.state.config["providers"]:
+            # print("provider", provider)
+            provider_timeout_settings = safe_get(provider, "preferences", "model_timeout", default={})
+            # print("provider_timeout_settings", provider_timeout_settings)
+            if provider_timeout_settings:
+                for model_name, timeout_value in provider_timeout_settings.items():
+                    app.state.provider_timeouts[provider['provider']][model_name] = timeout_value
+        # app.state.provider_timeouts["global_time_out"] = app.state.timeouts
+        # provider_timeouts_dict = {
+        #     provider: dict(timeouts)
+        #     for provider, timeouts in app.state.provider_timeouts.items()
+        # }
+        # print("app.state.provider_timeouts", provider_timeouts_dict)
+        # print("ai" in app.state.provider_timeouts)
     if app and not hasattr(app.state, "channel_manager"):
         if app.state.config and 'preferences' in app.state.config:
     return await call_next(request)
+def get_timeout_value(provider_timeouts, original_model):
+    timeout_value = None
+    if original_model in provider_timeouts:
+        timeout_value = provider_timeouts[original_model]
+    else:
+        # 尝试模糊匹配模型
+        for timeout_model in provider_timeouts:
+            if timeout_model != "default" and timeout_model in original_model:
+                timeout_value = provider_timeouts[timeout_model]
+                break
+        else:
+            # 如果模糊匹配失败，使用渠道的默认值
+            timeout_value = provider_timeouts.get("default")
+    return timeout_value
 # 在 process_request 函数中更新成功和失败计数
 async def process_request(request: Union[RequestModel, ImageGenerationRequest, AudioTranscriptionRequest, ModerationRequest, EmbeddingRequest], provider: Dict, endpoint=None):
     url = provider['base_url']
     current_info = request_info.get()
+    provider_timeouts = safe_get(app.state.provider_timeouts, channel_id, default=app.state.provider_timeouts["global_time_out"])
+    timeout_value = get_timeout_value(provider_timeouts, original_model)
+    if timeout_value is None:
+        timeout_value = get_timeout_value(app.state.provider_timeouts["global_time_out"], original_model)
     if timeout_value is None:
         timeout_value = app.state.timeouts.get("default", DEFAULT_TIMEOUT)
+    print("timeout_value", timeout_value)
     try:
         async with app.state.client_manager.get_client(timeout_value) as client: