ActionPark (Action Park)

KBlueLeaf

authored a paper 3 months ago

TIPO: Text to Image with Text Presampling for Prompt Optimization

Paper • 2411.08127 • Published Nov 12, 2024 • 3

bghira

posted an update 8 months ago

Post

4721

Wanted to share some brief comparison of early training of the two-stage PixArt e-diffi pipeline.

On the left, we have the full stage 1 model generating all 50 steps on its own. This model is not trained at all on the final 400 timesteps of the schedule. On the right, we have the combined pipeline where stage 1 output is fed into stage 2.

Currently, the difference is rather minimal - but the small details are reliably improved.

In the watercolour example, the full generation (right side) has the texture of the watercolour paper, and the partial generation (left side) has a more flat digital art look to it.

For the blacksmith robot, the sparks emitted from the operation have a more natural blend to it. The robot's clothing appears to be undergoing some interesting transformation due to the undertrained state of the weights.

The medieval battle image has improved blades of grass, settling dust particles, and fabric in the flag.

The stage 2 model being trained does not seem to resolve any global coherence issues despite having 400 steps in its schedule, but it still noticeably changes the local coherence, eg. the consistency of fabrics and metals can be improved through stage 2 fine-tuning.

The stage 1 model is the workhorse of the output, as expected with the 600 timesteps in its schedule. Additional fine-tuning of this model will improve the overall global coherence of the outputs. I wish I could say it will not impact fine details, but a lot of that does seem to be carried forward.

As noted, these models are undertrained due to a lack of compute. But they are a promising look toward what an e-diffi PixArt might be capable of.

Does anyone want to build this out fully with me?

1 reply

·

KBlueLeaf

authored a paper 8 months ago

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

Paper • 2407.06723 • Published Jul 9, 2024 • 11

KBlueLeaf

authored a paper over 1 year ago

Navigating Text-To-Image Customization:From LyCORIS Fine-Tuning to Model Evaluation

Paper • 2309.14859 • Published Sep 26, 2023 • 4

Action Park

AI & ML interests

ActionPark's activity

TIPO: Text to Image with Text Presampling for Prompt Optimization

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

Navigating Text-To-Image Customization:From LyCORIS Fine-Tuning to Model Evaluation

AI & ML interests

Team members 6

ActionPark's activity