Update README.md
Browse files
README.md
CHANGED
@@ -19,12 +19,12 @@ AquilaChat模型主要为了验证基础模型能力,您可以根据自己需
|
|
19 |
The AquilaChat model was primarily developed to verify the capabilities of the foundational model. You can use, modify, and commercialize the model according to your needs, but you must comply with all applicable laws and regulations in your country. Additionally, you must provide the source of the Aquila series models and a copy of the Aquila series model lincense to any third-party users.
|
20 |
|
21 |
## 模型细节/Model details
|
22 |
-
| Model | License | Commercial use? | GPU
|
23 |
-
| :---------------- | :------- | :-- |:-- |
|
24 |
-
|Aquila-7B
|
25 |
-
| AquilaCode-7B-
|
26 |
-
| AquilaCode-7B-
|
27 |
-
| AquilaChat-7B | Apache 2.0 | ✅ | Nvidia-A100 |
|
28 |
|
29 |
|
30 |
我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们应用了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
|
|
|
19 |
The AquilaChat model was primarily developed to verify the capabilities of the foundational model. You can use, modify, and commercialize the model according to your needs, but you must comply with all applicable laws and regulations in your country. Additionally, you must provide the source of the Aquila series models and a copy of the Aquila series model lincense to any third-party users.
|
20 |
|
21 |
## 模型细节/Model details
|
22 |
+
| Model | License | Commercial use? | GPU
|
23 |
+
| :---------------- | :------- | :-- |:-- |
|
24 |
+
| Aquila-7B | Apache 2.0 | ✅ | Nvidia-A100 |
|
25 |
+
| AquilaCode-7B-NV | Apache 2.0 | ✅ | Nvidia-A100 |
|
26 |
+
| AquilaCode-7B-TS | Apache 2.0 | ✅ | Tianshu-BI-V100 |
|
27 |
+
| AquilaChat-7B | Apache 2.0 | ✅ | Nvidia-A100 |
|
28 |
|
29 |
|
30 |
我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们应用了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
|