I need help.
#29
by
thebryanalvarado
- opened
Hello Comunity I want to improve this train process:
%%time
if to_train:
output_dir = f'./sql-training-{str(int(time.time()))}'
training_args = TrainingArguments(
output_dir=output_dir,
learning_rate=5e-3,
num_train_epochs=2,
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=16, # batch size for evaluation
weight_decay=0.01,
logging_steps=50,
evaluation_strategy='steps', # evaluation strategy to adopt during training
eval_steps=500, # number of steps between evaluation
)
trainer = Trainer(
model=finetuned_model,
args=training_args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['validation'],
)
trainer.train()
finetuned_model.save_pretrained("finetuned_model_2_epoch")
It can last 40 hours on my laptop with RTX 4050
Hi Bryan, your problem is probably already solved by now but anyway, from what I can see from your code you could probably benefit a lot by lowering the floating point precision to fp16, and then you would surely get the speed-up you are looking for. You might also find this helpful: https://discuss.huggingface.co/t/t5-fp16-issue-is-fixed/3139
Best of luck!