reference image + text prompt

#12
by kasiasta91 - opened

Hi, on the official documentation there is an example presented to combine a reference image + text prompt:

image.png

Is it possible with this repo dev version to do so as well? Could somebody point me to the place of code where this has been implemented?

THX!

Hey!

Here is an example on how to use it with reference image + prompt:

import torch
from diffusers import FluxPriorReduxPipeline, FluxPipeline
from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5TokenizerFast
from diffusers.utils import load_image
device = "cuda"
dtype = torch.bfloat16

text_encoder = CLIPTextModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder",
    torch_dtype=dtype,
)
text_encoder_2 = T5EncoderModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder_2",
    torch_dtype=dtype,
)
tokenizer = CLIPTokenizer.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="tokenizer",
)
tokenizer_2 = T5TokenizerFast.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="tokenizer_2",
)

repo_redux = "black-forest-labs/FLUX.1-Redux-dev"
repo_base = "black-forest-labs/FLUX.1-dev"

pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(
    repo_redux,
    text_encoder=text_encoder,
    text_encoder_2=text_encoder_2,
    tokenizer=tokenizer,
    tokenizer_2=tokenizer_2,
    torch_dtype=dtype
).to(device)

pipe = FluxPipeline.from_pretrained(
    repo_base,
    torch_dtype=dtype
).to(device)

my_image= load_image("image.png")

pipe_prior_output = pipe_prior_redux(
    my_image,
    prompt="",
)
images = pipe(
    guidance_scale=2.5,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0),
    **pipe_prior_output,
).images

Hey!

Here is an example on how to use it with reference image + prompt:

import torch
from diffusers import FluxPriorReduxPipeline, FluxPipeline
from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5TokenizerFast
from diffusers.utils import load_image
device = "cuda"
dtype = torch.bfloat16

text_encoder = CLIPTextModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder",
    torch_dtype=dtype,
)
text_encoder_2 = T5EncoderModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder_2",
    torch_dtype=dtype,
)
tokenizer = CLIPTokenizer.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="tokenizer",
)
tokenizer_2 = T5TokenizerFast.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="tokenizer_2",
)

repo_redux = "black-forest-labs/FLUX.1-Redux-dev"
repo_base = "black-forest-labs/FLUX.1-dev"

pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(
    repo_redux,
    text_encoder=text_encoder,
    text_encoder_2=text_encoder_2,
    tokenizer=tokenizer,
    tokenizer_2=tokenizer_2,
    torch_dtype=dtype
).to(device)

pipe = FluxPipeline.from_pretrained(
    repo_base,
    torch_dtype=dtype
).to(device)

my_image= load_image("image.png")

pipe_prior_output = pipe_prior_redux(
    my_image,
    prompt="",
)
images = pipe(
    guidance_scale=2.5,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0),
    **pipe_prior_output,
).images

this can not help, When I use this to generate text logo based on an existed text logo, the text is still the text in the referenced image

Hi @ouhenio ! Thanks for your remark! I also think this does not work as expected.

For reference image like this:
agatka_bez_tla.png

And prompt "an illustration of a cute little girl with a blond hair and blue dress laying on the rainbow", I am getting something like:
output.jpg
which is actually image based redux with some embeddings adapted, but for me far away from this (from BFL reference page I gave in the very first post):

image.png

Were you able to make any difference with the prompt. I see the same results. Seems it's ignored

Cannot to make style transfer with lora wieghts, the content of image is changed! Maybe need ip-adapter!

As per readme, "... the API endpoint allows users to modify an image given a textual description. The feature is supported in our latest model FLUX1.1 [pro] Ultra..."

So perhaps not possible to use text prompt along with image with publically available dev/schnell + redux repos.

@dhairyashil Yeah default redux doesn’t support any extra text prompts. However there is a way using attention masks, check this out: https://github.com/huggingface/diffusers/pull/10056#issuecomment-2525774617

Sign up or log in to comment