Update README.md
Browse files
README.md
CHANGED
@@ -96,6 +96,10 @@ Note: The following benchmarks are evaluated by TRT-LLM-backend
|
|
96 |
|
97 |
You can refer to the content in [Tencent-Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) to get started quickly. The training and inference code can use the version provided in this github repository.
|
98 |
|
|
|
|
|
|
|
|
|
99 |
### Inference Performance
|
100 |
|
101 |
This section presents the efficiency test results of deploying various models using vLLM, including inference speed (tokens/s) under different batch sizes.
|
|
|
96 |
|
97 |
You can refer to the content in [Tencent-Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) to get started quickly. The training and inference code can use the version provided in this github repository.
|
98 |
|
99 |
+
#### Inference Framework
|
100 |
+
- This open-source release offers two inference backend options tailored for the Hunyuan-7B model: the popular [vLLM-backend](https://github.com/quinnrong94/vllm/tree/dev_hunyuan) and the TensorRT-LLM Backend. In this release, we are initially open-sourcing the vLLM solution, with plans to release the TRT-LLM solution in the near future.
|
101 |
+
|
102 |
+
|
103 |
### Inference Performance
|
104 |
|
105 |
This section presents the efficiency test results of deploying various models using vLLM, including inference speed (tokens/s) under different batch sizes.
|