Text-to-image finetuning - jffacevedo/pxla_trained_model

This pipeline was finetuned from stabilityai/stable-diffusion-2-base on the lambdalabs/naruto-blip-captions dataset.

Pipeline usage

You can use the pipeline like so:

import torch
import os
import sys
import  numpy as np

import torch_xla.core.xla_model as xm
from time import time
from typing import Tuple
from diffusers import StableDiffusionPipeline

def main(args):
    device = xm.xla_device()
    model_path = <output_dir>
    pipe = StableDiffusionPipeline.from_pretrained(
        model_path, 
        torch_dtype=torch.bfloat16
    )
    pipe.to(device)
    prompt = ["A naruto with green eyes and red legs."]
    image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
    image.save("naruto.png")

if __name__ == '__main__':
    main()

Training info

These are the key hyperparameters used during training:

Steps: 50
Learning rate: 1e-06
Batch size: 32
Image resolution: 512
Mixed-precision: bf16

Intended uses & limitations

How to use

# TODO: add an example code snippet for running this diffusion pipeline

Limitations and bias

[TODO: provide examples of latent issues and potential remediations]

Training details

[TODO: describe the data used to train the model]

jffacevedo
/

pxla_trained_model