[Cache request] zephyr-7b-beta-neuron with sequence_length more than 4096

#11
by Anurag2132 - opened

Even 2048 would work, but need 4096 for my particular use case.

AWS Inferentia and Trainium org

The model is already present in the cache with a sequence_length of 4096.

Sign up or log in to comment