in fine tuning the model begin with zero loss.
`
import peft
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model
lora_config = LoraConfig(
r=32,
lora_alpha=16,
target_modules=[
'q_proj',
'k_proj',
'v_proj',
'dense',
'fc1',
'fc2',
],
bias="none",
lora_dropout=0.05,
task_type="CAUSAL_LM",
)
import transformers
from transformers import TrainingArguments
import torch
HAS_BFLOAT16 = torch.cuda.is_bf16_supported()
training_args = TrainingArguments(
output_dir= "phib",
max_steps = 100,
per_device_train_batch_size= 1,
gradient_accumulation_steps= 4,
optim="paged_adamw_32bit",
warmup_steps = 10,
logging_steps = 1,
logging_strategy="steps",
learning_rate = 2e-4,
fp16 = not HAS_BFLOAT16,
bf16 = HAS_BFLOAT16,
weight_decay = 0.01,
lr_scheduler_type = "linear",
group_by_length= True,
#disable_tqdm=False,
report_to="none",
seed = 3407,
)
`
check the lose
Step Training Loss 1 0.000000 2 0.000000 3 0.000000 4 0.000000 5 0.000000 6 0.000000 7 0.000000
Got the same issue on similar settings
Could you please try with microsoft/phi-1_5
and report if you are seing the same issue?
Can't try that right now, it looks like this rev "refs/pr/23" is working. The lora total number of trainable parameters are somehow 2 time higher as previous while conserving the same setting. I am wondering if this is supposed to be so (refs/pr/23 vs latest(Jan 16)) .
Could you please re-run with the latest update?
We updated the modeling_phi.py
file and disabled the auto-casting on the Attention layer. This is the same fix as the previous code had.
No problems! Please let me know if you see anything else.