Qwen
/

Qwen2-72B-Instruct

@@ -16,7 +16,7 @@ Compared with the state-of-the-art opensource language models, including the pre
 Qwen2-72B-Instruct supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs. Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2 for handling long texts.
-For more details, please refer to our [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2).
 <br>
 ## Model Details
@@ -117,7 +117,7 @@ For deployment, we recommend using vLLM. You can enable long-context capabilitie
         }'
     ```
-    For further usage instructions of vLLM, please refer to [our repository](https://github.com/QwenLM/Qwen2).
 **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts. We advise adding the `rope_scaling` configuration only when processing long contexts is required.

 Qwen2-72B-Instruct supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs. Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2 for handling long texts.
+For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2/) and [GitHub](https://github.com/QwenLM/Qwen2).
 <br>
 ## Model Details
         }'
     ```
+    For further usage instructions of vLLM, please refer to our [Github](https://github.com/QwenLM/Qwen2).
 **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts. We advise adding the `rope_scaling` configuration only when processing long contexts is required.