How to train or fine-tune flux-fill-dev
I tried to use a method similar to inpaint, raw_image+mask pair to train flux-fill-dev, but the effect was not good. I don't know why
Hi,have you solve it ?
Hi,have you solve it ?
No, are you doing similar work?
I have tried to write a training script based on sd-scripts
. The result is ok but not very good: the inpainting area is a little blur, compared to the origin image.
I got the similar result.I think the training loss may be different from t2i task, but I have no idea how to modify it
I got the similar result.I think the training loss may be different from t2i task, but I have no idea how to modify it
Very interesting. Maybe upscaling and denoising are important for training images.
I have tried to write a training script based on
sd-scripts
. The result is ok but not very good: the inpainting area is a little blur, compared to the origin image.
Yes, I tried to use the method trained with the sd-xl-inpaint script, and the results were also very poor, so I don’t know how to design this loss
I have tried to write a training script based on
sd-scripts
. The result is ok but not very good: the inpainting area is a little blur, compared to the origin image.Yes, I tried to use the method trained with the sd-xl-inpaint script, and the results were also very poor, so I don’t know how to design this loss
I see that this user was able to successfully use Flux Fill as the basic model for training. Pay attention to the layers that the author excluded during training. I assume that they can affect the image quality during generation and therefore were excluded, but this is just a guess, I have not yet clarified with the author at this point.
https://huggingface.co/xiaozaa/catvton-flux-lora-alpha
If you analyze it in more detail, at least the following layers were excluded::
Single blocks - all layers except attn layers are excluded
Double blocks - excluded mlp and attn_proj layers
This is what I noticed when I visually examined the model file. I assume that some of the other layers are also excluded.
And another example of a model trained on Flux Fill. Again, pay attention to the layers, not all layers are trained as in the example above.
I trained Single blocks based on flow matching, and the generated images are indeed much better than the original version. However, the understanding of text is still somewhat inaccurate. For example, if I want to draw three red area and two green area in the masked area, it might end up drawing five green area and one red area . How can I address this issue of poor text-to-image matching?