Pi3141's picture
Update README.md
11e94ef
---
license: wtfpl
language:
- en
pipeline_tag: text-generation
tags:
- llama
library_name: adapter-transformers
---
Non-safetensor version: [pi3141/alpaca-7b-native-enhanced-GPTQ](https://huggingface.co/Pi3141/alpaca-7b-native-enhanced-GPTQ)
### About the GPTQ version
- Quantized to 4-bits 128g using GPTQ-for-LLaMA.
- Intended for use with Oobabooga Text Generation WebUI.
### Loading model in Oobabooga WebUI
- Use same parameters as the original model, which can be found in the original repo linked below.
- Use `ExLlamav2` loader.
### Information about original model
*Original repo: [8bit-coder/alpaca-7b-nativeEnhanced](https://huggingface.co/8bit-coder/alpaca-7b-nativeEnhanced)*
*Alternate: [pi3141/alpaca-7b-native-enhanced](https://huggingface.co/pi3141/alpaca-7b-native-enhanced)*
Below are information about the original model
---
<p align="center"><img src="https://cdn-uploads.huggingface.co/production/uploads/615a1b7a321f65c4da59c3d3/DFHgrYeqJNIchgLrgfZzl.png" height=256></p>
<h1 align="center">
Alpaca 7B Native Enhanced
</h1>
<p align="center">The Most Advanced Alpaca 7B Model</p>
## πŸ“ƒ Model Facts
- Trained natively on 8x Nvidia A100 40GB GPUs; no LoRA used
- Trained on the largest & most accurate dataset yet
- Enhanced Programming Capabilities
- First Alpaca model to have conversational awareness
## πŸš€ Quick Start Guide
Step 1. Make sure git-lfs is installed and ready to use ([Guide](https://git-lfs.com/))
Step 2. Download and install [text-generation-webui](https://github.com/oobabooga/text-generation-webui) according to the repository's instructions
Step 3. Navigate over to one of it's model folders and clone this repository:
git clone https://huggingface.co/8bit-coder/alpaca-7b-nativeEnhanced
Step 4. Launch the webui, replace "Your name" with "User" and replace the default instruction prompt with:
> You are an AI language model designed to assist the User by answering their questions, offering advice, and engaging in casual conversation in a friendly, helpful, and informative manner. You respond clearly, coherently, and you consider the conversation history.
>
> User: Hey, how's it going?
>
> Assistant: Hey there! I'm doing great, thank you. What can I help you with today? Let's have a fun chat!
Step 5. Change the settings to match this screenshot:
![Settings](https://cdn-uploads.huggingface.co/production/uploads/615a1b7a321f65c4da59c3d3/m8s2o52xN2I6MDy0sZ5rZ.png)
## πŸ“š Training
#### We used 8x Nvidia A100 40GB GPUs for training this model. Training time took ~3 hours and resulting loss was 0.4761 over 3 epochs. The command used for training is as follows
> **torchrun --nproc_per_node=8 --master_port=3045 ./stanford_alpaca/train.py --model_name_or_path ./llama-7b-hf --data_path ./alpaca-7b-nativeEnhanced/training_files/alpaca-megaset-fixed.json --fp16 True --output_dir ./output_7b --num_train_epochs 3 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 16 --evaluation_strategy "no" --save_strategy "steps" --save_steps 200 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --fsdp "full_shard auto_wrap" --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' --tf32 True**
There's a folder in this repository called training_files. **full-training-instructions.txt** is the full list of commands from start to finish of training, to converting the model all the way to 4 bit quantized ggml. **It is not recommended to quantize this model down to 4 bits. The instructions are included purely for informational purposes.**
In addition, the training instructions file is built specifically for rented cloud computing. This means that by following the commands in the file, anyone should be able to train a similar model.
### Common errors while training
- CUDA Out of Memory error
- This is because your GPUs do not have a minimum of 40GB of vram. The weakest GPU that we've been able to successfully train on has been Nvidia A100 40GB. Even with 8 of these, the vram usage was almost always right up at the limit. If you have 40GB GPUs and are still running into this error, try halving the **per_device_train_batch_size** and **per_device_eval_batch_size** and doubling the **gradient_accumulation_steps**. If you have more than 40GB of vram per GPU and wish to train faster, the opposite applies.
- LLaMATokenizer error
- This happens because you forgot to fix tokenizer_config.json in the llama-7b-hf directory. The fix is to rename **LLaMATokenizer** to **LlamaTokenizer** in that file.
- RuntimeError: CUDA error: invalid device ordinal
- This error occurs when your **nproc_per_node** is set to a number greater than how many GPUs you have installed in your system. You can check how many GPUs you have installed by running **nvidia-smi**.
- torchrun is not recognized
- This error occurs when you have a python version older than 3.10. Follow the instructions in the training instructions file to install miniconda and get python 3.10 set up. Circumventing this error by running python -m torch.distributed.run will **not work**. Many of the dependencies require python 3.10 and will fatally error out at the start of training.
- KeyError
- This happens when your JSON training data is broken in some way. Try running the dataset_validator.py in the training_files folder to find the broken key.
## πŸ“ Notes
- The main version of this model is in the hugging face transformers data type. The other one (.pth) format is provided **purely for experimental use with llama.cpp** and is not guaranteed to have conversational awareness.
- This model exhibits weird behavior when quantized to 4 bits. This might be due to the complexity of the model. We recommend the smallest quantization to be 8 bits, but this is untested.
- This model is slightly **underfitted**. We observed that training the model with a smaller gradient accumulation size benefitted the response quality.
- This model appears to have full conversational awareness. This means that provided you're running the model in the same configuration we detailed in the Quick Start Guide, you should be able to hold very detailed conversation with the AI without issues. There is a limit to it's memory, and it's 2048 tokens. Beyond that, it'll forget details and will need to be reminded.
## πŸ”§ Dataset
The dataset used for training this model is made from [AlpacaDataCleaned](https://github.com/gururise/AlpacaDataCleaned) and [codealpaca](https://github.com/sahil280114/codealpaca). We combined these datasets for the following reasons:
1. Increased accuracy since the original stanford_alpaca dataset had many errors.
2. Better knowledge in programming
3. More training data
We had an issue with the latest AlpacaDataCleaned dataset where at around 90k lines in, one of the keys has a typo. The key is "instruction:" instead of "instruction". We have fixed this error in the provided megaset but if you plan on grabbing directly from AlpacaDataCleaned, make sure to fix this error. Otherwise, the training script will fail due to a KeyError.
## πŸ‘¨β€πŸ’» Credits
Credits go to [Meta](https://github.com/facebookresearch/llama) for creating the foundational LLaMA models and [Stanford](https://github.com/tatsu-lab/stanford_alpaca) for the instructions on how to train. For the dataset, credits go to [AlpacaDataCleaned](https://github.com/gururise/AlpacaDataCleaned) and [codealpaca](https://github.com/sahil280114/codealpaca). Credits also go to [chavinlo](https://huggingface.co/chavinlo/alpaca-native) for creating the original Alpaca 7B Native model, the inspiration behind this model.
Lastly, credits go to the homies that stayed up all night again and again: 8bit, Ο€, chug, Taddy, yoyodapro, Symax, and most importantly: stablediffusion for the beautiful artwork