Initial Upload

cc70246 over 1 year ago

4.06 kB

	---
	language:
	- en
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- medical
	license: cc-by-nc-3.0
	---

	# MedFalcon v2.1a 40b LoRA - Step 4500

	![img.png](img.png)

	## Model Description

	This a model check point release at 4500 steps. For evaluation use only! Limitations:
	* LoRA output will be more concise than the base model
	* Due to the size, base knowledge may be overwritten from falcon-40b
	* Due to the size, more hardware may be required to load falcon-40b when using this LoRA

	### Architecture
	`nmitchko/medfalconv2-1a-40b-lora'` is a large language model LoRa specifically fine-tuned for medical domain tasks.
	It is based on [`Falcon-40b`](https://huggingface.co/tiiuae/falcon-40b) at 40 billion parameters.

	The primary goal of this model is to improve question-answering and medical dialogue tasks.
	It was trained using [LoRA](https://arxiv.org/abs/2106.09685), specifically [QLora](https://github.com/artidoro/qlora), to reduce memory footprint.

	See Training Parameters for more info This Lora supports 4-bit and 8-bit modes.

	### Requirements

	```
	bitsandbytes>=0.39.0
	peft
	transformers
	```

	Steps to load this model:
	1. Load base model using transformers
	2. Apply LoRA using peft

	```python
	#
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import transformers
	import torch
	from peft import PeftModel

	model = "tiiuae/falcon-40b"
	LoRA = "nmitchko/medfalconv2-1a-40b-lora"

	# If you want 8 or 4 bit set the appropriate flags
	load_8bit = True

	tokenizer = AutoTokenizer.from_pretrained(model)

	model = AutoModelForCausalLM.from_pretrained(model,
	load_in_8bit=load_8bit,
	torch_dtype=torch.float16,
	trust_remote_code=True,
	)

	model = PeftModel.from_pretrained(model, LoRA)

	pipeline = transformers.pipeline(
	"text-generation",
	model=model,
	tokenizer=tokenizer,
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	device_map="auto",
	)

	sequences = pipeline(
	"What does the drug ceftrioxone do?\nDoctor:",
	max_length=200,
	do_sample=True,
	top_k=40,
	num_return_sequences=1,
	eos_token_id=tokenizer.eos_token_id,
	)

	for seq in sequences:
	print(f"Result: {seq['generated_text']}")
	```

	## Training Parameters

	The model was trained for 4500 steps or 1 epoch on a custom, unreleased dataset named `medconcat`.
	`medconcat` contains only human generated content and weighs in at over 100MiB of raw text.

	The below bash script initiated training in `4bit` mode for a rather large LoRA:

	\| Item \| Amount \| Units \|
	\|---------------\|--------\|-------\|
	\| LoRA Rank \| 128 \| ~ \|
	\| LoRA Alpha \| 256 \| ~ \|
	\| Learning Rate \| 1e-3 \| SI \|
	\| Dropout \| 5 \| % \|


	```bash
	CURRENTDATEONLY=`date +"%b %d %Y"`

	sudo nvidia-smi -i 1 -pl 250

	export CUDA_VISIBLE_DEVICES=0

	nohup python qlora.py \
	--model_name_or_path models/tiiuae_falcon-40b \
	--output_dir ./loras/medfalcon2.1a-40b \
	--logging_steps 100 \
	--save_strategy steps \
	--data_seed 42 \
	--save_steps 200 \
	--save_total_limit 40 \
	--evaluation_strategy steps \
	--eval_dataset_size 1024 \
	--max_eval_samples 1000 \
	--per_device_eval_batch_size 1 \
	--max_new_tokens 32 \
	--dataloader_num_workers 3 \
	--group_by_length \
	--logging_strategy steps \
	--remove_unused_columns False \
	--do_train \
	--lora_r 128 \
	--lora_alpha 256 \
	--lora_modules all \
	--double_quant \
	--quant_type nf4 \
	--bf16 \
	--bits 4 \
	--warmup_ratio 0.03 \
	--lr_scheduler_type constant \
	--gradient_checkpointing \
	--dataset="training/datasets/medconcat/" \
	--dataset_format alpaca \
	--trust_remote_code=True \
	--source_max_len 16 \
	--target_max_len 512 \
	--per_device_train_batch_size 1 \
	--gradient_accumulation_steps 16 \
	--max_steps 4500 \
	--eval_steps 1000 \
	--learning_rate 0.0001 \
	--adam_beta2 0.999 \
	--max_grad_norm 0.3 \
	--lora_dropout 0.05 \
	--weight_decay 0.0 \
	--seed 0 > "${CURRENTDATEONLY}-finetune-medfalcon2.1a.log" &
	```