ChatGLM2-6B

Introduce

ChatGLM2-6B is the second-generation version of the open source Chinese-English bilingual dialogue model ChatGLM-6B. It retains the smooth dialogue and deployment of the first-generation model. On the basis of many excellent features such as low threshold, ChatGLM2-6B introduces the following new features:

More powerful performance: Based on the development experience of the first-generation ChatGLM model, we have comprehensively upgraded the base model of ChatGLM2-6B. ChatGLM2-6B uses the hybrid objective function of GLM, and has been pre-trained with 1.4T Chinese and English identifiers and human preference alignment training, [evaluation results](#evaluation The results) show that compared with the first-generation model, ChatGLM2-6B has achieved great performance on MMLU (+23%), CEval (+33%), GSM8K (+571%), BBH (+60%) and other data sets. The improvement in magnitude makes it highly competitive among open source models of the same size.
Longer context: Based on FlashAttention technology, we change the context length (Context Length) of the base model from 2K of ChatGLM-6B Extended to 32K, and trained with 8K context length during the dialogue phase, allowing more rounds of dialogue. However, the current version of ChatGLM2-6B has limited ability to understand single-round ultra-long documents. We will focus on optimization in subsequent iterative upgrades.
More efficient inference: Based on Multi-Query Attention technology, ChatGLM2-6B has more efficient inference speed and lower memory usage: in Under the official model implementation, the inference speed is increased by 42% compared to the first generation. Under INT4 quantification, the conversation length supported by 6G video memory is increased from 1K to 8K.
More open protocol: ChatGLM2-6B weights are fully open to academic research, after filling in the questionnaire for registration** Free commercial use is also permitted**.

ChatGLM2-6B is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B. It retains the smooth conversation flow and low deployment threshold of the first-generation model, while introducing the following new features:

Stronger Performance: Based on the development experience of the first-generation ChatGLM model, we have fully upgraded the base model of ChatGLM2-6B. ChatGLM2-6B uses the hybrid objective function of GLM, and has undergone pre-training with 1.4T bilingual tokens and human preference alignment training. The evaluation results show that, compared to the first-generation model, ChatGLM2-6B has achieved substantial improvements in performance on datasets like MMLU (+23%), CEval (+33%), GSM8K (+571%), BBH (+60%), showing strong competitiveness among models of the same size.
Longer Context: Based on FlashAttention technique, we have extended the context length of the base model from 2K in ChatGLM-6B to 32K, and trained with a context length of 8K during the dialogue alignment, allowing for more rounds of dialogue. However, the current version of ChatGLM2-6B has limited understanding of single-round ultra-long documents, which we will focus on optimizing in future iterations.
More Efficient Inference: Based on Multi-Query Attention technique, ChatGLM2-6B has more efficient inference speed and lower GPU memory usage: under the official implementation, the inference speed has increased by 42% compared to the first generation; under INT4 quantization, the dialogue length supported by 6G GPU memory has increased from 1K to 8K.
More Open License: ChatGLM2-6B weights are completely open for academic research, and free commercial use is also allowed after completing the questionnaire.

Software dependencies

pip install protobuf transformers==4.30.2 cpm_kernels torch>=2.0 gradio mdtex2html sentencepiece accelerate

Code call

The ChatGLM-6B model can be called by the following code to generate a conversation:

>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "你好", history=[])
>>> print(response)
Hello 👋! I am the artificial intelligence assistant ChatGLM-6B. Nice to meet you. You are welcome to ask me any questions.
>>> response, history = model.chat(tokenizer, "What should I do if I can't sleep at night", history=history)
>>> print(response)
Staying awake at night may make you feel anxious or uncomfortable, but here are some things you can do to help you fall asleep:

1. Establish a regular sleep schedule: Maintaining a regular sleep schedule can help you establish healthy sleep habits and make it easier for you to fall asleep. Try to go to bed at the same time every day and get up at the same time.
2. Create a comfortable sleeping environment: Make sure the sleeping environment is comfortable, quiet, dark and at a suitable temperature. Use comfortable bedding and keep the room ventilated.
3. Relax your body and mind: Doing some relaxing activities before going to bed, such as taking a hot bath, listening to some soft music, reading some interesting books, etc., can help relieve tension and anxiety and make it easier for you to fall asleep.
4. Avoid drinking beverages containing caffeine: Caffeine is a stimulating substance that can affect your sleep quality. Try to avoid drinking caffeinated beverages such as coffee, tea and cola before bed.
5. Avoid doing things in bed that are not related to sleep: Doing things in bed that are not related to sleep, such as watching movies, playing games or working, may interfere with your sleep.
6. Try breathing techniques: Deep breathing is a relaxation technique that can help you relieve tension and anxiety and make it easier for you to fall asleep. Try to inhale slowly, hold for a few seconds, and then exhale slowly.

If these methods don't help you fall asleep, you may consider talking to your doctor or sleep specialist for further advice.

For more instructions, including how to run the command line and web version of DEMO, and use model quantization to save video memory, please refer to our Github Repo.

For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo.

Change Log

v1.0

License

The code of this repository is open source according to the Apache-2.0 agreement. The use of the weights of the ChatGLM2-6B model needs to follow the Model License.

Quote

If you find our work helpful, please consider citing the following papers. The ChatGLM2-6B paper will be published in the near future, so stay tuned~

@article{zeng2022glm,
  title={Glm-130b: An open bilingual pre-trained model},
  author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
  journal={arXiv preprint arXiv:2210.02414},
  year={2022}
}

@inproceedings{du2022glm,
  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={320--335},
  year={2022}
}