reference image + text prompt

#12

by kasiasta91 - opened Dec 11, 2024

Dec 11, 2024

Hi, on the official documentation there is an example presented to combine a reference image + text prompt:

Is it possible with this repo dev version to do so as well? Could somebody point me to the place of code where this has been implemented?

THX!

ouhenio

Jan 9

•

edited Jan 9

Hey!

Here is an example on how to use it with reference image + prompt:

import torch
from diffusers import FluxPriorReduxPipeline, FluxPipeline
from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5TokenizerFast
from diffusers.utils import load_image
device = "cuda"
dtype = torch.bfloat16

text_encoder = CLIPTextModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder",
    torch_dtype=dtype,
)
text_encoder_2 = T5EncoderModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder_2",
    torch_dtype=dtype,
)
tokenizer = CLIPTokenizer.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="tokenizer",
)
tokenizer_2 = T5TokenizerFast.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="tokenizer_2",
)

repo_redux = "black-forest-labs/FLUX.1-Redux-dev"
repo_base = "black-forest-labs/FLUX.1-dev"

pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(
    repo_redux,
    text_encoder=text_encoder,
    text_encoder_2=text_encoder_2,
    tokenizer=tokenizer,
    tokenizer_2=tokenizer_2,
    torch_dtype=dtype
).to(device)

pipe = FluxPipeline.from_pretrained(
    repo_base,
    torch_dtype=dtype
).to(device)

my_image= load_image("image.png")

pipe_prior_output = pipe_prior_redux(
    my_image,
    prompt="",
)
images = pipe(
    guidance_scale=2.5,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0),
    **pipe_prior_output,
).images

LHJ0

Jan 10

Hey!

Here is an example on how to use it with reference image + prompt:

import torch
from diffusers import FluxPriorReduxPipeline, FluxPipeline
from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5TokenizerFast
from diffusers.utils import load_image
device = "cuda"
dtype = torch.bfloat16

text_encoder = CLIPTextModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder",
    torch_dtype=dtype,
)
text_encoder_2 = T5EncoderModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder_2",
    torch_dtype=dtype,
)
tokenizer = CLIPTokenizer.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="tokenizer",
)
tokenizer_2 = T5TokenizerFast.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="tokenizer_2",
)

repo_redux = "black-forest-labs/FLUX.1-Redux-dev"
repo_base = "black-forest-labs/FLUX.1-dev"

pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(
    repo_redux,
    text_encoder=text_encoder,
    text_encoder_2=text_encoder_2,
    tokenizer=tokenizer,
    tokenizer_2=tokenizer_2,
    torch_dtype=dtype
).to(device)

pipe = FluxPipeline.from_pretrained(
    repo_base,
    torch_dtype=dtype
).to(device)

my_image= load_image("image.png")

pipe_prior_output = pipe_prior_redux(
    my_image,
    prompt="",
)
images = pipe(
    guidance_scale=2.5,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0),
    **pipe_prior_output,
).images

this can not help, When I use this to generate text logo based on an existed text logo, the text is still the text in the referenced image

kasiasta91

Jan 10

Hi @ouhenio ! Thanks for your remark! I also think this does not work as expected.

For reference image like this:

And prompt "an illustration of a cute little girl with a blond hair and blue dress laying on the rainbow", I am getting something like:

which is actually image based redux with some embeddings adapted, but for me far away from this (from BFL reference page I gave in the very first post):

georgetachev

Mar 31

Were you able to make any difference with the prompt. I see the same results. Seems it's ignored

heerwan1

Mar 31

Cannot to make style transfer with lora wieghts, the content of image is changed! Maybe need ip-adapter!

dhairyashil

about 1 month ago

As per readme, "... the API endpoint allows users to modify an image given a textual description. The feature is supported in our latest model FLUX1.1 [pro] Ultra..."

So perhaps not possible to use text prompt along with image with publically available dev/schnell + redux repos.

YaTharThShaRma999

30 days ago

@dhairyashil Yeah default redux doesn’t support any extra text prompts. However there is a way using attention masks, check this out: https://github.com/huggingface/diffusers/pull/10056#issuecomment-2525774617

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment