Which padding side to choose while finetuning
Should I choose padding side left or right in finetuning?
reference:
https://wandb.ai/byyoung3/ml-news/reports/Fine-Tuning-Mistral-7B-on-Python-Code-With-A-Single-GPU---Vmlldzo1NTg0NzY5 (suggest right)
https://github.com/brevdev/notebooks/blob/main/mistral-finetune.ipynb (suggest left)
Looks like out of the box Mistral doesn't have a pad_token_id
? π€
I don't understand why, surely they must have used padding during training?
any updates on this @Hannibal046 @parikshit1619 ? mistral was originally trained with left side padding and after doing a bit of research most forums are supporting left side as well so that LLM has no mixup of data and pad tokens. Can anybody confirm this
Maybe we need two tokenizers?
My understanding is that "padding_side=left" is needed when we generate the output, because of Mistral is "decoder-only", see this : https://huggingface.co/docs/transformers/llm_tutorial#wrong-padding-side.
But when using lora for training, "padding_side=right" is needed to avoid overflow issues, see this : https://discuss.huggingface.co/t/qlora-with-gptq/58009
Please correct me if I'm wrong!