Caching doesn't work on multi gpu

#23

by srinivasbilla - opened Jun 2, 2023

Discussion

srinivasbilla

Jun 2, 2023

I get gibberish if caching is enabled when inferencing over multigpu

captain-fim

Jun 4, 2023

@eastwind, so you do not get gibberish every time?
Would you kindly post some non-gibberish examples?
What did you do to go from Gibberish to English?

captain-fim

Jun 4, 2023

@eastwind I now found your contribution here to answer the last question. Thanks!
https://huggingface.co/tiiuae/falcon-40b-instruct/discussions/20

srinivasbilla

Jun 4, 2023

Yeah, not using cache hurts performance alot.

FalconLLM

Technology Innovation Institute org Jun 9, 2023

We recommend using Text Generation Inference for fast inference with Falcon. See this blog for more information.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment