Chinese-Alpaca-Plus-13B-GPTQ

This is GPTQ format quantised 4bit models of Yiming Cui's Chinese-LLaMA-Alpaca 13B.

It is the result of quantising to 4bit using GPTQ-for-LLaMa.

Model Details

Model Description

  • Developed by: ymcui (Yiming Cui)
  • Shared by: Known Rabbit
  • Language(s) (NLP): Chinese, English
  • License: Apache 2.0
  • Finetuned from model: LLaMA

The original Github project: ymcui/Chinese-LLaMA-Alpaca: 中文LLaMA&Alpaca大语言模型+本地CPU/GPU部署 (Chinese LLaMA & Alpaca LLMs)

In order to promote the open research of large models in the Chinese NLP community, this project open sourced the Chinese LLaMA model and the Alpaca large model with fine-tuned instructions. Based on the original LLaMA, these models expand the Chinese vocabulary and use Chinese data for secondary pre-training, which further improves the basic semantic understanding of Chinese. At the same time, the Chinese Alpaca model further uses Chinese instruction data for fine-tuning, which significantly improves the model's ability to understand and execute instructions. For details, please refer to the technical report (Cui, Yang, and Yao, 2023).

Model Sources

Uses

Direct Use

How to easily download and use this model in text-generation-webui

Open the text-generation-webui UI as normal.

  1. Click the Model tab.
  2. Under Download custom model or LoRA, enter rabitt/Chinese-Alpaca-Plus-13B-GPTQ.
  3. Click Download.
  4. Wait until it says it's finished downloading.
  5. Click the Refresh icon next to Model in the top left.
  6. In the Model drop-down: choose the model you just downloaded, Chinese-Alpaca-Plus-13B-GPTQ.
  7. If you see an error like Error no file named pytorch_model.bin ... in the bottom right, ignore it - it's temporary.
  8. Fill out the GPTQ parameters on the right: Bits = 4, Groupsize = 128, model_type = Llama
  9. Click Save settings for this model in the top right.
  10. Click Reload the Model in the top right.
  11. Once it says it's loaded, click the Text Generation tab and enter a prompt!

Training Details

Training Procedure

  1. Download models from the following links

  2. Convert LLaMA to HuggingFace (HF) format with convert_llama_weights_to_hf.py

    wget https://github.com/huggingface/transformers/raw/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
    PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python \
    python convert_llama_weights_to_hf.py \
        --input_dir ./llama \
        --model_size 13B \
        --output_dir ./llama-13b-hf
    
  3. Merge Chinese-LLaMA-Plus-13B and Chinese-Alpaca-Plus-13B into LLaMA with merge_llama_with_chinese_lora.py

    wget https://github.com/ymcui/Chinese-LLaMA-Alpaca/raw/main/scripts/merge_llama_with_chinese_lora.py
    python merge_llama_with_chinese_lora.py \
        --base_model ./llama-13b-hf \
        --lora_model ./Chinese-LLaMA-Plus-LoRA-13B,./Chinese-Alpaca-Plus-LoRA-13B \
        --output_type huggingface \
        --output_dir ./Chinese-Alpaca-Plus-13B
    
  4. Quantise the model with GPTQ-for-LLaMa

    mkdir -p Chinese-Alpaca-Plus-13B-GPTQ
    git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
    cd GPTQ-for-LLaMa
    # export CUDA_VISIBLE_DEVICES=0
    python llama.py ../Chinese-Alpaca-Plus-13B c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors ../Chinese-Alpaca-Plus-13B-GPTQ/Chinese-Alpaca-Plus-13B-GPTQ-4bit-128g.safetensors
    

Citation

BibTeX:

@article{chinese-llama-alpaca,
      title={Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca}, 
      author={Cui, Yiming and Yang, Ziqing and Yao, Xin},
      journal={arXiv preprint arXiv:2304.08177},
      url={https://arxiv.org/abs/2304.08177},
      year={2023}
}
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.