This is a model compiled for optimized inference on AWS inf2 instances. The overall performance can achieve 4it/s for a 1024x1024 picture.
NOTE: In order to load and infer on inf2.xlarge instances, a minimum of 8 GB SWAP memory is required. Other instance sizes do not require this.
Assuming you're using the official DLAMI or installed the drivers and libraries required for neuron devices, install the latest optimum_neuronx library:
pip install -U optimum[neuronx] diffusers=0.20.0
Then, enjoy the super-fast experience:
from optimum.neuron import NeuronStableDiffusionXLPipeline
pipe = NeuronStableDiffusionXLPipeline.from_pretrained("paulkm/sdxl_neuron_pipe", device_ids=[0,1])
img = pipe("a cute black cat").images[0]
Inference API (serverless) does not yet support optimum_neuronx models for this pipeline type.