willhe-xverse commited on
Commit
67711c2
·
verified ·
1 Parent(s): 099838f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -24
README.md CHANGED
@@ -5,30 +5,23 @@ inference: false
5
 
6
  ---
7
 
8
- # XVERSE-65B-Chat-GPTQ-Int4
9
 
10
  ## 更新信息
11
 
12
- **[2024/03/25]** 发布XVERSE-65B-Chat-GPTQ-Int4量化模型,支持vLLM推理xverse-65b量化模型。
13
-
14
- **[2023/12/08]** 发布 **XVERSE-65B-2** 底座模型,该模型在前一版本的基础上进行了 **Continual Pre-Training**,训练总 token 量达到 **3.2** 万亿;模型各方面的能力均得到提升,尤其是数学和代码能力,在 GSM8K 上提升 **20**%,HumanEval 上提升 **41**%。
15
- **[2023/11/29]** 更新模型架构及更多底座数据的相关信息。
16
-
17
- **[2023/11/24]** 更新预训练数据的相关信息。
18
-
19
- **[2023/11/06]** 发布 65B 尺寸的 XVERSE-65B 底座模型。
20
 
21
  ## Update Information
22
 
23
- **[2024/03/25] ** Release the XVERSE-65B-Chat-GPTQ-Int4 quantification model, supporting vLLM inference for the xverse-65b quantification model.
24
-
25
- **[2023/12/08]** Released the **XVERSE-65B-2** base model. This model builds upon its predecessor through **Continual Pre-Training**, reaching a total training volume of **3.2** trillion tokens. It exhibits enhancements in all capabilities, particularly in mathematics and coding skills, with a **20%** improvement on the GSM8K benchmark and a **41%** increase on HumanEval.
26
-
27
- **[2023/11/29]** Update model architecture and additional pre-training data information.
28
-
29
- **[2023/11/24]** Update the related information of the pre-training data.
30
-
31
- **[2023/11/06]** Released the XVERSE-65B base model.
32
 
33
  ## 模型介绍
34
 
@@ -70,19 +63,25 @@ We advise you to clone [`vllm`](https://github.com/vllm-project/vllm.git) and in
70
 
71
  ## 使用方法
72
 
73
- 我们演示了如何使用 `vllm` 来运行XVERSE-65B-Chat-GPTQ-Int4量化模型:
 
 
 
 
 
 
74
 
75
  ```python
76
  from vllm import LLM, SamplingParams
77
 
78
- model_dir = "xverse/XVERSE-65B-Chat-GPTQ-Int4/"
79
 
80
  # Create an LLM.
81
  llm = LLM(model_dir,
82
  trust_remote_code=True)
83
 
84
  # Create a sampling params object.
85
- sampling_params = SamplingParams(temperature=0.5, top_p=0.85, max_tokens=2048, repetition_penalty=1.1)
86
 
87
  # Generate texts from the prompts. The output is a list of RequestOutput objects
88
  # that contain the prompt, generated text, and other information.
@@ -98,19 +97,26 @@ for output in outputs:
98
 
99
  ## Usage
100
 
101
- We demonstrated how to use 'vllm' to run the XVERSE-65B-Chat GPTQ Int4 quantization model:
 
 
 
 
 
 
 
102
 
103
  ```python
104
  from vllm import LLM, SamplingParams
105
 
106
- model_dir = "xverse/XVERSE-65B-Chat-GPTQ-Int4/"
107
 
108
  # Create an LLM.
109
  llm = LLM(model_dir,
110
  trust_remote_code=True)
111
 
112
  # Create a sampling params object.
113
- sampling_params = SamplingParams(temperature=0.5, top_p=0.85, max_tokens=2048, repetition_penalty=1.1)
114
 
115
  # Generate texts from the prompts. The output is a list of RequestOutput objects
116
  # that contain the prompt, generated text, and other information.
 
5
 
6
  ---
7
 
8
+ # XVERSE-65B-Chat-GPTQ-Int8
9
 
10
  ## 更新信息
11
 
12
+ - **[2024/03/25]** 发布XVERSE-65B-Chat-GPTQ-Int8量化模型,支持vLLM推理xverse-65b量化模型。
13
+ - **[2023/12/08]** 发布 **XVERSE-65B-2** 底座模型,该模型在前一版本的基础上进行了 **Continual Pre-Training**,训练总 token 量达到 **3.2** 万亿;模型各方面的能力均得到提升,尤其是数学和代码能力,在 GSM8K 上提升 **20**%,HumanEval 上提升 **41**%。
14
+ - **[2023/11/29]** 更新模型架构及更多底座数据的相关信息。
15
+ - **[2023/11/24]** 更新预训练数据的相关信息。
16
+ - **[2023/11/06]** 发布 65B 尺寸的 XVERSE-65B 底座模型。
 
 
 
17
 
18
  ## Update Information
19
 
20
+ - **[2024/03/25]** Release the XVERSE-65B-Chat-GPTQ-Int8 quantification model, supporting vLLM inference for the xverse-65b quantification model.
21
+ - **[2023/12/08]** Released the **XVERSE-65B-2** base model. This model builds upon its predecessor through **Continual Pre-Training**, reaching a total training volume of **3.2** trillion tokens. It exhibits enhancements in all capabilities, particularly in mathematics and coding skills, with a **20%** improvement on the GSM8K benchmark and a **41%** increase on HumanEval.
22
+ - **[2023/11/29]** Update model architecture and additional pre-training data information.
23
+ - **[2023/11/24]** Update the related information of the pre-training data.
24
+ - **[2023/11/06]** Released the XVERSE-65B base model.
 
 
 
 
25
 
26
  ## 模型介绍
27
 
 
63
 
64
  ## 使用方法
65
 
66
+ 由于上传的safetensors文件大小超出50GB的最大文件限制,因此我们将safetensors文件切分为3个,因此您可以将它们连接起来以获得整个文件:
67
+
68
+ ```bash
69
+ cat gptq_model-8bit-128g.safetensors.* > gptq_model-8bit-128g.safetensors
70
+ ```
71
+
72
+ 我们演示了如何使用 `vllm` 来运行XVERSE-65B-Chat-GPTQ-Int8量化模型:
73
 
74
  ```python
75
  from vllm import LLM, SamplingParams
76
 
77
+ model_dir = "xverse/XVERSE-65B-Chat-GPTQ-Int8/"
78
 
79
  # Create an LLM.
80
  llm = LLM(model_dir,
81
  trust_remote_code=True)
82
 
83
  # Create a sampling params object.
84
+ sampling_params = SamplingParams(temperature=0.85, top_p=0.85, max_tokens=2048, repetition_penalty=1.1)
85
 
86
  # Generate texts from the prompts. The output is a list of RequestOutput objects
87
  # that contain the prompt, generated text, and other information.
 
97
 
98
  ## Usage
99
 
100
+ Due to the uploaded safetensors file size exceeding the maximum file limit of 50GB,
101
+ we have divided the safetensors file into three parts, so you can connect them together to obtain the entire file:
102
+
103
+ ```bash
104
+ cat gptq_model-8bit-128g.safetensors.* > gptq_model-8bit-128g.safetensors
105
+ ```
106
+
107
+ We demonstrated how to use 'vllm' to run the XVERSE-65B-Chat-GPTQ-Int8 quantization model:
108
 
109
  ```python
110
  from vllm import LLM, SamplingParams
111
 
112
+ model_dir = "xverse/XVERSE-65B-Chat-GPTQ-Int8/"
113
 
114
  # Create an LLM.
115
  llm = LLM(model_dir,
116
  trust_remote_code=True)
117
 
118
  # Create a sampling params object.
119
+ sampling_params = SamplingParams(temperature=0.85, top_p=0.85, max_tokens=2048, repetition_penalty=1.1)
120
 
121
  # Generate texts from the prompts. The output is a list of RequestOutput objects
122
  # that contain the prompt, generated text, and other information.