why dtype is bfloat16?
#61
by
Macropodus
- opened
Is there any considerations?
Hi
@Macropodus
That is the dtype the model has been used for training. It also makes distribution much convenient as in fp32 the model weight files would take ~32GB instead of ~16
@ybelkada
Got it, but maybe the weight is not conducive to fine-tuning, most weights of llm is fp16.
always loss is nan while finetune a few step by loading weights of fp16. fp32 is ok.
Hi
@Macropodus
Thanks for getting bach. For using the float16
model you can load it with revision="float16"
in from_pretrained
got it
Macropodus
changed discussion status to
closed