About QLoRA Implementation
Hi, thanks for sharing the great work! I was wondering what is the QLoRA implementation you use for the fine-tuning. Do you use the official QLoRA code (https://github.com/artidoro/qlora/tree/main) or you implement it by yourself? Besides, could you describe the hardware settings and the corresponding training speed?
nop, I use https://github.com/hiyouga/LLaMA-Efficient-Tuning.
I use deepspeed ZERO-2 + FlashAttention2 + 4bit QLoRA to training with 8 A100(80G), then the batch size can be set to around 16.
as for training speed, It may take 5-6 hours for training? I'm not really clear with that.
btw, Our team tends to upload a better model and details about the dataset mix-up strategy, the hardware setting, the training settings and steps will also be provided, but it may take several days for that.
Thanks for your prompt reply. I am anticipating your following work!