yym68686 commited on
Commit
0a213f9
·
1 Parent(s): 7775638

✨ Feature: Add feature: Support setting model timeout at the channel level

Browse files
Files changed (3) hide show
  1. README.md +8 -0
  2. README_CN.md +8 -0
  3. main.py +37 -15
README.md CHANGED
@@ -97,6 +97,10 @@ providers:
97
  # default: 4/min # If the model does not set the frequency limit, use the frequency limit of default
98
  api_key_cooldown_period: 60 # Each API Key will be cooled down for 60 seconds after encountering a 429 error. Optional, the default is 0 seconds. When set to 0, the cooling mechanism is not enabled. When there are multiple API keys, the cooling mechanism will take effect.
99
  api_key_schedule_algorithm: round_robin # Set the request order of multiple API Keys, optional. The default is round_robin, and the optional values are: round_robin, random. It will take effect when there are multiple API keys. round_robin is polling load balancing, and random is random load balancing.
 
 
 
 
100
 
101
  - provider: vertex
102
  project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
@@ -389,6 +393,10 @@ All scheduling algorithms need to be enabled by setting api_keys.(api).preferenc
389
 
390
  Except for some special channels shown in the advanced configuration, all OpenAI format providers need to fill in the base_url completely, which means the base_url must end with /v1/chat/completions. If you are using GitHub models, the base_url should be filled in as https://models.inference.ai.azure.com/chat/completion, not Azure's URL.
391
 
 
 
 
 
392
  ## ⭐ Star History
393
 
394
  <a href="https://github.com/yym68686/uni-api/stargazers">
 
97
  # default: 4/min # If the model does not set the frequency limit, use the frequency limit of default
98
  api_key_cooldown_period: 60 # Each API Key will be cooled down for 60 seconds after encountering a 429 error. Optional, the default is 0 seconds. When set to 0, the cooling mechanism is not enabled. When there are multiple API keys, the cooling mechanism will take effect.
99
  api_key_schedule_algorithm: round_robin # Set the request order of multiple API Keys, optional. The default is round_robin, and the optional values are: round_robin, random. It will take effect when there are multiple API keys. round_robin is polling load balancing, and random is random load balancing.
100
+ model_timeout: # Model timeout, in seconds, default 100 seconds, optional
101
+ gemini-1.5-pro: 10 # Model gemini-1.5-pro timeout is 10 seconds
102
+ gemini-1.5-flash: 10 # Model gemini-1.5-flash timeout is 10 seconds
103
+ default: 10 # Model does not have a timeout set, use the default timeout of 10 seconds, when requesting a model not in model_timeout, the timeout is also 10 seconds, if default is not set, uni-api will use the default timeout set by the environment variable TIMEOUT, the default timeout is 100 seconds
104
 
105
  - provider: vertex
106
  project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
 
393
 
394
  Except for some special channels shown in the advanced configuration, all OpenAI format providers need to fill in the base_url completely, which means the base_url must end with /v1/chat/completions. If you are using GitHub models, the base_url should be filled in as https://models.inference.ai.azure.com/chat/completion, not Azure's URL.
395
 
396
+ - How does the model timeout time work? What is the priority of the channel-level timeout setting and the global model timeout setting?
397
+
398
+ The channel-level timeout setting has higher priority than the global model timeout setting. The priority order is: channel-level model timeout setting > channel-level default timeout setting > global model timeout setting > global default timeout setting > environment variable TIMEOUT.
399
+
400
  ## ⭐ Star History
401
 
402
  <a href="https://github.com/yym68686/uni-api/stargazers">
README_CN.md CHANGED
@@ -97,6 +97,10 @@ providers:
97
  # default: 4/min # 如果模型没有设置频率限制,使用 default 的频率限制
98
  api_key_cooldown_period: 60 # 每个 API Key 遭遇 429 错误后的冷却时间,单位为秒,选填。默认为 0 秒, 当设置为 0 秒时,不启用冷却机制。当存在多个 API key 时才会生效。
99
  api_key_schedule_algorithm: round_robin # 设置多个 API Key 的请求顺序,选填。默认为 round_robin,可选值有:round_robin,random。当存在多个 API key 时才会生效。round_robin 是轮询负载均衡,random 是随机负载均衡。
 
 
 
 
100
 
101
  - provider: vertex
102
  project_id: gen-lang-client-xxxxxxxxxxxxxx # 描述: 您的Google Cloud项目ID。格式: 字符串,通常由小写字母、数字和连字符组成。获取方式: 在Google Cloud Console的项目选择器中可以找到您的项目ID。
@@ -389,6 +393,10 @@ api_keys:
389
 
390
  除了高级配置里面所展示的一些特殊的渠道,所有 OpenAI 格式的提供商需要把 base_url 填完整,也就是说 base_url 必须以 /v1/chat/completions 结尾。如果你使用的 GitHub models,base_url 应该填写为 https://models.inference.ai.azure.com/chat/completion,而不是 Azure 的 URL。
391
 
 
 
 
 
392
  ## ⭐ Star 历史
393
 
394
  <a href="https://github.com/yym68686/uni-api/stargazers">
 
97
  # default: 4/min # 如果模型没有设置频率限制,使用 default 的频率限制
98
  api_key_cooldown_period: 60 # 每个 API Key 遭遇 429 错误后的冷却时间,单位为秒,选填。默认为 0 秒, 当设置为 0 秒时,不启用冷却机制。当存在多个 API key 时才会生效。
99
  api_key_schedule_algorithm: round_robin # 设置多个 API Key 的请求顺序,选填。默认为 round_robin,可选值有:round_robin,random。当存在多个 API key 时才会生效。round_robin 是轮询负载均衡,random 是随机负载均衡。
100
+ model_timeout: # 模型超时时间,单位为秒,默认 100 秒,选填
101
+ gemini-1.5-pro: 10 # 模型 gemini-1.5-pro 的超时时间为 10 秒
102
+ gemini-1.5-flash: 10 # 模型 gemini-1.5-flash 的超时时间为 10 秒
103
+ default: 10 # 模型没有设置超时时间,使用默认的超时时间 10 秒,当请求的不在 model_timeout 里面的模型时,超时时间默认是 10 秒,不设置 default,uni-api 会使用全局配置的模型超时时间。
104
 
105
  - provider: vertex
106
  project_id: gen-lang-client-xxxxxxxxxxxxxx # 描述: 您的Google Cloud项目ID。格式: 字符串,通常由小写字母、数字和连字符组成。获取方式: 在Google Cloud Console的项目选择器中可以找到您的项目ID。
 
393
 
394
  除了高级配置里面所展示的一些特殊的渠道,所有 OpenAI 格式的提供商需要把 base_url 填完整,也就是说 base_url 必须以 /v1/chat/completions 结尾。如果你使用的 GitHub models,base_url 应该填写为 https://models.inference.ai.azure.com/chat/completion,而不是 Azure 的 URL。
395
 
396
+ - 模型超时时间是如何确认的?渠道级别的超时设置和全局模型超时设置的优先级是什么?
397
+
398
+ 渠道级别的超时设置优先级高于全局模型超时设置。优先级顺序:渠道级别模型超时设置 > 渠道级别默认超时设置 > 全局模型超时设置 > 全局默认超时设置 > 环境变量 TIMEOUT。
399
+
400
  ## ⭐ Star 历史
401
 
402
  <a href="https://github.com/yym68686/uni-api/stargazers">
main.py CHANGED
@@ -39,7 +39,7 @@ import os
39
  import string
40
  import json
41
 
42
- DEFAULT_TIMEOUT = float(os.getenv("TIMEOUT", 100))
43
  is_debug = bool(os.getenv("DEBUG", False))
44
  # is_debug = False
45
 
@@ -643,7 +643,22 @@ async def ensure_config(request: Request, call_next):
643
  if "default" not in app.state.config['preferences'].get('model_timeout', {}):
644
  app.state.timeouts["default"] = DEFAULT_TIMEOUT
645
 
646
- # print("app.state.timeouts", app.state.timeouts)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
647
 
648
  if app and not hasattr(app.state, "channel_manager"):
649
  if app.state.config and 'preferences' in app.state.config:
@@ -655,6 +670,21 @@ async def ensure_config(request: Request, call_next):
655
 
656
  return await call_next(request)
657
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
658
  # 在 process_request 函数中更新成功和失败计数
659
  async def process_request(request: Union[RequestModel, ImageGenerationRequest, AudioTranscriptionRequest, ModerationRequest, EmbeddingRequest], provider: Dict, endpoint=None):
660
  url = provider['base_url']
@@ -729,21 +759,13 @@ async def process_request(request: Union[RequestModel, ImageGenerationRequest, A
729
 
730
  current_info = request_info.get()
731
 
732
- timeout_value = None
733
- # 先尝试精确匹配
734
-
735
- if original_model in app.state.timeouts:
736
- timeout_value = app.state.timeouts[original_model]
737
- else:
738
- # 如果没有精确匹配,尝试模糊匹配
739
- for timeout_model in app.state.timeouts:
740
- if timeout_model in original_model:
741
- timeout_value = app.state.timeouts[timeout_model]
742
- break
743
-
744
- # 如果都没匹配到,使用默认值
745
  if timeout_value is None:
746
  timeout_value = app.state.timeouts.get("default", DEFAULT_TIMEOUT)
 
747
 
748
  try:
749
  async with app.state.client_manager.get_client(timeout_value) as client:
 
39
  import string
40
  import json
41
 
42
+ DEFAULT_TIMEOUT = int(os.getenv("TIMEOUT", 100))
43
  is_debug = bool(os.getenv("DEBUG", False))
44
  # is_debug = False
45
 
 
643
  if "default" not in app.state.config['preferences'].get('model_timeout', {}):
644
  app.state.timeouts["default"] = DEFAULT_TIMEOUT
645
 
646
+ app.state.provider_timeouts = defaultdict(lambda: defaultdict(lambda: DEFAULT_TIMEOUT))
647
+ for provider in app.state.config["providers"]:
648
+ # print("provider", provider)
649
+ provider_timeout_settings = safe_get(provider, "preferences", "model_timeout", default={})
650
+ # print("provider_timeout_settings", provider_timeout_settings)
651
+ if provider_timeout_settings:
652
+ for model_name, timeout_value in provider_timeout_settings.items():
653
+ app.state.provider_timeouts[provider['provider']][model_name] = timeout_value
654
+
655
+ # app.state.provider_timeouts["global_time_out"] = app.state.timeouts
656
+ # provider_timeouts_dict = {
657
+ # provider: dict(timeouts)
658
+ # for provider, timeouts in app.state.provider_timeouts.items()
659
+ # }
660
+ # print("app.state.provider_timeouts", provider_timeouts_dict)
661
+ # print("ai" in app.state.provider_timeouts)
662
 
663
  if app and not hasattr(app.state, "channel_manager"):
664
  if app.state.config and 'preferences' in app.state.config:
 
670
 
671
  return await call_next(request)
672
 
673
+ def get_timeout_value(provider_timeouts, original_model):
674
+ timeout_value = None
675
+ if original_model in provider_timeouts:
676
+ timeout_value = provider_timeouts[original_model]
677
+ else:
678
+ # 尝试模糊匹配模型
679
+ for timeout_model in provider_timeouts:
680
+ if timeout_model != "default" and timeout_model in original_model:
681
+ timeout_value = provider_timeouts[timeout_model]
682
+ break
683
+ else:
684
+ # 如果模糊匹配失败,使用渠道的默认值
685
+ timeout_value = provider_timeouts.get("default")
686
+ return timeout_value
687
+
688
  # 在 process_request 函数中更新成功和失败计数
689
  async def process_request(request: Union[RequestModel, ImageGenerationRequest, AudioTranscriptionRequest, ModerationRequest, EmbeddingRequest], provider: Dict, endpoint=None):
690
  url = provider['base_url']
 
759
 
760
  current_info = request_info.get()
761
 
762
+ provider_timeouts = safe_get(app.state.provider_timeouts, channel_id, default=app.state.provider_timeouts["global_time_out"])
763
+ timeout_value = get_timeout_value(provider_timeouts, original_model)
764
+ if timeout_value is None:
765
+ timeout_value = get_timeout_value(app.state.provider_timeouts["global_time_out"], original_model)
 
 
 
 
 
 
 
 
 
766
  if timeout_value is None:
767
  timeout_value = app.state.timeouts.get("default", DEFAULT_TIMEOUT)
768
+ print("timeout_value", timeout_value)
769
 
770
  try:
771
  async with app.state.client_manager.get_client(timeout_value) as client: