|
--- |
|
library_name: peft |
|
license: afl-3.0 |
|
datasets: |
|
- nickrosh/Evol-Instruct-Code-80k-v1 |
|
--- |
|
## Training procedure |
|
|
|
|
|
The following `bitsandbytes` quantization config was used during training: |
|
- quant_method: bitsandbytes |
|
- load_in_8bit: False |
|
- load_in_4bit: True |
|
- llm_int8_threshold: 6.0 |
|
- llm_int8_skip_modules: None |
|
- llm_int8_enable_fp32_cpu_offload: False |
|
- llm_int8_has_fp16_weight: False |
|
- bnb_4bit_quant_type: nf4 |
|
- bnb_4bit_use_double_quant: False |
|
- bnb_4bit_compute_dtype: float16 |
|
### Framework versions |
|
|
|
- PEFT 0.6.0.dev0 |
|
|
|
""" |
|
|
|
Original file is located at |
|
https://colab.research.google.com/drive/1yH0ov1ZDpun6yGi19zE07jkF_EUMI1Bf |
|
|
|
**Code Credit: Hugging Face** |
|
|
|
**Dataset Credit: https://twitter.com/Dorialexander/status/1681671177696161794 ** |
|
|
|
## Finetune Llama-2-7b on a Google colab |
|
|
|
Welcome to this Google Colab notebook that shows how to fine-tune the recent code Llama-2-7b model on a single Google colab and turn it into a chatbot |
|
|
|
We will leverage PEFT library from Hugging Face ecosystem, as well as QLoRA for more memory efficient finetuning |
|
|
|
## Setup |
|
|
|
Run the cells below to setup and install the required libraries. For our experiment we will need `accelerate`, `peft`, `transformers`, `datasets` and TRL to leverage the recent [`SFTTrainer`](https://huggingface.co/docs/trl/main/en/sft_trainer). We will use `bitsandbytes` to [quantize the base model into 4bit](https://huggingface.co/blog/4bit-transformers-bitsandbytes). We will also install `einops` as it is a requirement to load Falcon models. |
|
""" |
|
|
|
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git |
|
!pip install -q datasets bitsandbytes einops wandb |
|
|
|
"""## Dataset |
|
|
|
login huggingface |
|
""" |
|
|
|
import wandb |
|
|
|
!wandb login |
|
|
|
# Initialize WandB |
|
wandb_key=["<API_KEY>"] |
|
wandb.init(project="<project_name>", |
|
name="<name>" |
|
) |
|
|
|
# login with API |
|
from huggingface_hub import login |
|
login() |
|
|
|
from datasets import load_dataset |
|
|
|
#dataset_name = "timdettmers/openassistant-guanaco" ###Human ,.,,,,,, ###Assistant |
|
dataset_name = "nickrosh/Evol-Instruct-Code-80k-v1" |
|
#dataset_name = 'AlexanderDoria/novel17_test' #french novels |
|
dataset = load_dataset(dataset_name, split="train") |
|
|
|
"""## Loading the model""" |
|
|
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer |
|
|
|
#model_name = "TinyPixel/Llama-2-7B-bf16-sharded" |
|
#model_name = "abhinand/Llama-2-7B-bf16-sharded-512MB" |
|
model_name= "TinyPixel/CodeLlama-7B-Instruct-bf16-sharded" |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch.float16, |
|
) |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
quantization_config=bnb_config, |
|
trust_remote_code=True |
|
) |
|
model.config.use_cache = False |
|
|
|
"""Let's also load the tokenizer below""" |
|
|
|
inputs = tokenizer(text, return_tensors="pt", padding="max_length", max_length=max_seq_length, truncation=True).to(device) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
from peft import LoraConfig, get_peft_model |
|
|
|
lora_alpha = 16 |
|
lora_dropout = 0.1 |
|
lora_r = 64 |
|
|
|
peft_config = LoraConfig( |
|
lora_alpha=lora_alpha, |
|
lora_dropout=lora_dropout, |
|
r=lora_r, |
|
bias="none", |
|
task_type="CAUSAL_LM" |
|
) |
|
|
|
"""## Loading the trainer |
|
|
|
Here we will use the [`SFTTrainer` from TRL library](https://huggingface.co/docs/trl/main/en/sft_trainer) that gives a wrapper around transformers `Trainer` to easily fine-tune models on instruction based datasets using PEFT adapters. Let's first load the training arguments below. |
|
""" |
|
|
|
from transformers import TrainingArguments |
|
|
|
output_dir = "./results" |
|
per_device_train_batch_size = 4 |
|
gradient_accumulation_steps = 4 |
|
optim = "paged_adamw_32bit" |
|
save_steps = 100 |
|
logging_steps = 10 |
|
learning_rate = 2e-4 |
|
max_grad_norm = 0.3 |
|
max_steps = 100 |
|
warmup_ratio = 0.03 |
|
lr_scheduler_type = "constant" |
|
|
|
training_arguments = TrainingArguments( |
|
output_dir=output_dir, |
|
per_device_train_batch_size=per_device_train_batch_size, |
|
gradient_accumulation_steps=gradient_accumulation_steps, |
|
optim=optim, |
|
save_steps=save_steps, |
|
logging_steps=logging_steps, |
|
learning_rate=learning_rate, |
|
fp16=True, |
|
max_grad_norm=max_grad_norm, |
|
max_steps=max_steps, |
|
warmup_ratio=warmup_ratio, |
|
group_by_length=True, |
|
lr_scheduler_type=lr_scheduler_type, |
|
) |
|
|
|
"""Then finally pass everthing to the trainer""" |
|
|
|
from trl import SFTTrainer |
|
|
|
max_seq_length = 512 |
|
|
|
trainer = SFTTrainer( |
|
model=model, |
|
train_dataset=dataset, |
|
peft_config=peft_config, |
|
dataset_text_field="output", |
|
max_seq_length=max_seq_length, |
|
tokenizer=tokenizer, |
|
args=training_arguments, |
|
) |
|
|
|
"""We will also pre-process the model by upcasting the layer norms in float 32 for more stable training""" |
|
|
|
for name, module in trainer.model.named_modules(): |
|
if "norm" in name: |
|
module = module.to(torch.float32) |
|
|
|
"""## Train the model |
|
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. |
|
Now let's train the model! Simply call `trainer.train()` |
|
""" |
|
|
|
trainer.train() |
|
|
|
"""During training, the model should converge nicely as follows: |
|
The `SFTTrainer` also takes care of properly saving only the adapters during training instead of saving the entire model. |
|
""" |
|
|
|
model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model # Take care of distributed/parallel training |
|
model_to_save.save_pretrained("outputs") |
|
|
|
lora_config = LoraConfig.from_pretrained('outputs') |
|
model = get_peft_model(model, lora_config) |
|
|
|
dataset['output'] |
|
|
|
text = "make a advanced python script to finetune a llama2-7b-bf16-sharded model with accelerator and qlora" |
|
device = "cuda:0" |
|
inputs = tokenizer(text, return_tensors="pt", padding="max_length", max_length=max_seq_length, truncation=True).to(device) |
|
#inputs = tokenizer(text, return_tensors="pt").to(device) |
|
outputs = model.generate(**inputs, max_new_tokens=150) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=False)) |
|
|
|
model.push_to_hub("K00B404/CodeLlama-7B-Instruct-bf16-sharded-ft-v0_01", use_auth_token="<HUGGINGFACE_WRITE-api") |
|
|