Caching doesn't work on multi gpu

#23
by srinivasbilla - opened

I get gibberish if caching is enabled when inferencing over multigpu

@eastwind, so you do not get gibberish every time?
Would you kindly post some non-gibberish examples?
What did you do to go from Gibberish to English?

@eastwind I now found your contribution here to answer the last question. Thanks!
https://huggingface.co/tiiuae/falcon-40b-instruct/discussions/20

Yeah, not using cache hurts performance alot.

Technology Innovation Institute org

We recommend using Text Generation Inference for fast inference with Falcon. See this blog for more information.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment