woodchen7 commited on
Commit
87598a5
·
verified ·
1 Parent(s): 20b75b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -96,6 +96,10 @@ Note: The following benchmarks are evaluated by TRT-LLM-backend
96
 
97
  You can refer to the content in [Tencent-Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) to get started quickly. The training and inference code can use the version provided in this github repository.
98
 
 
 
 
 
99
  ### Inference Performance
100
 
101
  This section presents the efficiency test results of deploying various models using vLLM, including inference speed (tokens/s) under different batch sizes.
 
96
 
97
  You can refer to the content in [Tencent-Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) to get started quickly. The training and inference code can use the version provided in this github repository.
98
 
99
+ #### Inference Framework
100
+ - This open-source release offers two inference backend options tailored for the Hunyuan-7B model: the popular [vLLM-backend](https://github.com/quinnrong94/vllm/tree/dev_hunyuan) and the TensorRT-LLM Backend. In this release, we are initially open-sourcing the vLLM solution, with plans to release the TRT-LLM solution in the near future.
101
+
102
+
103
  ### Inference Performance
104
 
105
  This section presents the efficiency test results of deploying various models using vLLM, including inference speed (tokens/s) under different batch sizes.