Loading model without fast-attn
#10
by
TZ20
- opened
Hi, if I set trust_remote_code = False
when loading the model, will it just be the normal LlamaForCausalLM? If so, then running with 32K length would require too much computational power
Hi
@TZ20
, thanks for your question! Yes, setting trust_remote_code=False
will result in using the LlamaForCausalLM hardcoded in the huggingface library. Since this does not make use of flash attention, the speed will be lower and memory footprint higher.