Loading model without fast-attn

#10

by TZ20 - opened Sep 12, 2023

Discussion

TZ20

Sep 12, 2023

•

edited Sep 12, 2023

Hi, if I set trust_remote_code = Falsewhen loading the model, will it just be the normal LlamaForCausalLM? If so, then running with 32K length would require too much computational power

mauriceweber

Together org Sep 12, 2023

Hi @TZ20 , thanks for your question! Yes, setting trust_remote_code=False will result in using the LlamaForCausalLM hardcoded in the huggingface library. Since this does not make use of flash attention, the speed will be lower and memory footprint higher.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment