willhe-xverse
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -5,30 +5,23 @@ inference: false
|
|
5 |
|
6 |
---
|
7 |
|
8 |
-
# XVERSE-65B-Chat-GPTQ-
|
9 |
|
10 |
## 更新信息
|
11 |
|
12 |
-
**[2024/03/25]** 发布XVERSE-65B-Chat-GPTQ-
|
13 |
-
|
14 |
-
**[2023/
|
15 |
-
**[2023/11/
|
16 |
-
|
17 |
-
**[2023/11/24]** 更新预训练数据的相关信息。
|
18 |
-
|
19 |
-
**[2023/11/06]** 发布 65B 尺寸的 XVERSE-65B 底座模型。
|
20 |
|
21 |
## Update Information
|
22 |
|
23 |
-
**[2024/03/25]
|
24 |
-
|
25 |
-
**[2023/
|
26 |
-
|
27 |
-
**[2023/11/
|
28 |
-
|
29 |
-
**[2023/11/24]** Update the related information of the pre-training data.
|
30 |
-
|
31 |
-
**[2023/11/06]** Released the XVERSE-65B base model.
|
32 |
|
33 |
## 模型介绍
|
34 |
|
@@ -70,19 +63,25 @@ We advise you to clone [`vllm`](https://github.com/vllm-project/vllm.git) and in
|
|
70 |
|
71 |
## 使用方法
|
72 |
|
73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
```python
|
76 |
from vllm import LLM, SamplingParams
|
77 |
|
78 |
-
model_dir = "xverse/XVERSE-65B-Chat-GPTQ-
|
79 |
|
80 |
# Create an LLM.
|
81 |
llm = LLM(model_dir,
|
82 |
trust_remote_code=True)
|
83 |
|
84 |
# Create a sampling params object.
|
85 |
-
sampling_params = SamplingParams(temperature=0.
|
86 |
|
87 |
# Generate texts from the prompts. The output is a list of RequestOutput objects
|
88 |
# that contain the prompt, generated text, and other information.
|
@@ -98,19 +97,26 @@ for output in outputs:
|
|
98 |
|
99 |
## Usage
|
100 |
|
101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
|
103 |
```python
|
104 |
from vllm import LLM, SamplingParams
|
105 |
|
106 |
-
model_dir = "xverse/XVERSE-65B-Chat-GPTQ-
|
107 |
|
108 |
# Create an LLM.
|
109 |
llm = LLM(model_dir,
|
110 |
trust_remote_code=True)
|
111 |
|
112 |
# Create a sampling params object.
|
113 |
-
sampling_params = SamplingParams(temperature=0.
|
114 |
|
115 |
# Generate texts from the prompts. The output is a list of RequestOutput objects
|
116 |
# that contain the prompt, generated text, and other information.
|
|
|
5 |
|
6 |
---
|
7 |
|
8 |
+
# XVERSE-65B-Chat-GPTQ-Int8
|
9 |
|
10 |
## 更新信息
|
11 |
|
12 |
+
- **[2024/03/25]** 发布XVERSE-65B-Chat-GPTQ-Int8量化模型,支持vLLM推理xverse-65b量化模型。
|
13 |
+
- **[2023/12/08]** 发布 **XVERSE-65B-2** 底座模型,该模型在前一版本的基础上进行了 **Continual Pre-Training**,训练总 token 量达到 **3.2** 万亿;模型各方面的能力均得到提升,尤其是数学和代码能力,在 GSM8K 上提升 **20**%,HumanEval 上提升 **41**%。
|
14 |
+
- **[2023/11/29]** 更新模型架构及更多底座数据的相关信息。
|
15 |
+
- **[2023/11/24]** 更新预训练数据的相关信息。
|
16 |
+
- **[2023/11/06]** 发布 65B 尺寸的 XVERSE-65B 底座模型。
|
|
|
|
|
|
|
17 |
|
18 |
## Update Information
|
19 |
|
20 |
+
- **[2024/03/25]** Release the XVERSE-65B-Chat-GPTQ-Int8 quantification model, supporting vLLM inference for the xverse-65b quantification model.
|
21 |
+
- **[2023/12/08]** Released the **XVERSE-65B-2** base model. This model builds upon its predecessor through **Continual Pre-Training**, reaching a total training volume of **3.2** trillion tokens. It exhibits enhancements in all capabilities, particularly in mathematics and coding skills, with a **20%** improvement on the GSM8K benchmark and a **41%** increase on HumanEval.
|
22 |
+
- **[2023/11/29]** Update model architecture and additional pre-training data information.
|
23 |
+
- **[2023/11/24]** Update the related information of the pre-training data.
|
24 |
+
- **[2023/11/06]** Released the XVERSE-65B base model.
|
|
|
|
|
|
|
|
|
25 |
|
26 |
## 模型介绍
|
27 |
|
|
|
63 |
|
64 |
## 使用方法
|
65 |
|
66 |
+
由于上传的safetensors文件大小超出50GB的最大文件限制,因此我们将safetensors文件切分为3个,因此您可以将它们连接起来以获得整个文件:
|
67 |
+
|
68 |
+
```bash
|
69 |
+
cat gptq_model-8bit-128g.safetensors.* > gptq_model-8bit-128g.safetensors
|
70 |
+
```
|
71 |
+
|
72 |
+
我们演示了如何使用 `vllm` 来运行XVERSE-65B-Chat-GPTQ-Int8量化模型:
|
73 |
|
74 |
```python
|
75 |
from vllm import LLM, SamplingParams
|
76 |
|
77 |
+
model_dir = "xverse/XVERSE-65B-Chat-GPTQ-Int8/"
|
78 |
|
79 |
# Create an LLM.
|
80 |
llm = LLM(model_dir,
|
81 |
trust_remote_code=True)
|
82 |
|
83 |
# Create a sampling params object.
|
84 |
+
sampling_params = SamplingParams(temperature=0.85, top_p=0.85, max_tokens=2048, repetition_penalty=1.1)
|
85 |
|
86 |
# Generate texts from the prompts. The output is a list of RequestOutput objects
|
87 |
# that contain the prompt, generated text, and other information.
|
|
|
97 |
|
98 |
## Usage
|
99 |
|
100 |
+
Due to the uploaded safetensors file size exceeding the maximum file limit of 50GB,
|
101 |
+
we have divided the safetensors file into three parts, so you can connect them together to obtain the entire file:
|
102 |
+
|
103 |
+
```bash
|
104 |
+
cat gptq_model-8bit-128g.safetensors.* > gptq_model-8bit-128g.safetensors
|
105 |
+
```
|
106 |
+
|
107 |
+
We demonstrated how to use 'vllm' to run the XVERSE-65B-Chat-GPTQ-Int8 quantization model:
|
108 |
|
109 |
```python
|
110 |
from vllm import LLM, SamplingParams
|
111 |
|
112 |
+
model_dir = "xverse/XVERSE-65B-Chat-GPTQ-Int8/"
|
113 |
|
114 |
# Create an LLM.
|
115 |
llm = LLM(model_dir,
|
116 |
trust_remote_code=True)
|
117 |
|
118 |
# Create a sampling params object.
|
119 |
+
sampling_params = SamplingParams(temperature=0.85, top_p=0.85, max_tokens=2048, repetition_penalty=1.1)
|
120 |
|
121 |
# Generate texts from the prompts. The output is a list of RequestOutput objects
|
122 |
# that contain the prompt, generated text, and other information.
|