Text-to-image finetuning - jffacevedo/pxla_trained_model

This pipeline was finetuned from stabilityai/stable-diffusion-2-base on the lambdalabs/naruto-blip-captions dataset.

Pipeline usage

You can use the pipeline like so:

import torch
import os
import sys
import  numpy as np

import torch_xla.core.xla_model as xm
from time import time
from typing import Tuple
from diffusers import StableDiffusionPipeline

def main(args):
    device = xm.xla_device()
    model_path = <output_dir>
    pipe = StableDiffusionPipeline.from_pretrained(
        model_path, 
        torch_dtype=torch.bfloat16
    )
    pipe.to(device)
    prompt = ["A naruto with green eyes and red legs."]
    image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
    image.save("naruto.png")

if __name__ == '__main__':
    main()

Training info

These are the key hyperparameters used during training:

  • Steps: 50
  • Learning rate: 1e-06
  • Batch size: 32
  • Image resolution: 512
  • Mixed-precision: bf16

Intended uses & limitations

How to use

# TODO: add an example code snippet for running this diffusion pipeline

Limitations and bias

[TODO: provide examples of latent issues and potential remediations]

Training details

[TODO: describe the data used to train the model]

Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jffacevedo/pxla_trained_model

Finetuned
(11)
this model