Text-to-Image
anime
girls
FA770 commited on
Commit
cfea923
·
verified ·
1 Parent(s): b5ef23b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -3
README.md CHANGED
@@ -1,3 +1,124 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - KBlueLeaf/danbooru2023-webp-4Mpixel
5
+ - KBlueLeaf/danbooru2023-metadata-database
6
+ base_model:
7
+ - black-forest-labs/FLUX.1-schnell
8
+ - FA770/Sumeshi_Flux.1_S_v002E
9
+ pipeline_tag: text-to-image
10
+ tags:
11
+ - anime
12
+ - girls
13
+ ---
14
+
15
+ ![sample_image](./sample_images/1.webp)
16
+
17
+ # Model Information
18
+
19
+ **Note:** This model is a Schnell-based model, but it requires CFG scale of 3 or higher (not guidance scale) with 20 steps or more. It needs to be used with `clip_l_nigemono_flux1C`.
20
+ **At this time, this model cannot be generated in WebUI Forge. Please use it with ComfyUI.**
21
+
22
+ My English is terrible, so I use translation tools.
23
+
24
+ ## Description
25
+ Nigemono_Flux.1_Compact7.5B is an experimental anime model to verify the reduction of parameters in the Flux model. The number of parameters has been reduced from `12B` (double blocks 19/ single blocks 38) to `7.5B` (double blocks 12/ single blocks 24). You can use a negative prompt which works to some extent. The output is blurred and the style varies depending on the prompt, perhaps because the model has not been fully trained.
26
+
27
+ ## Usage
28
+ - Resolution: Like other Flux models
29
+ - **(Distilled) Guidance Scale:** (Distilled) Guidance Scale: 0 (Does not work due to Schnell-based model)
30
+ - **CFG Scale:** 6 ~ 9 (recommend 7; scale 1 does not generate decent outputs)
31
+ - **Steps:** 20 ~ 100 or and more (recommend 40)
32
+ - sampler: Euler
33
+ - scheduler: Simple, Beta
34
+
35
+ ## Prompt Format (from [Kohaku-XL-Epsilon](https://huggingface.co/KBlueLeaf/Kohaku-XL-Epsilon))
36
+ ```<1girl/1boy/1other/...>, <character>, <series>, <artists>, <general tags>, <quality tags>, <year tags>, <meta tags>, <rating tags>```
37
+
38
+ Due to the small amount of training, the `<character><series><artists>` tags are almost non-functional. As training is focused on girl characters, it may not generate boy or other non-persons well. Since the dataset was created using hakubooru, the prompt format will be the same as the KohakuXL format. However, based on experiments, it is not strictly necessary to follow this format, as it interprets meaning to some extent even in natural language.
39
+
40
+ ### Special Tags
41
+ - **Quality Tags:** masterpiece, best quality, great quality, good quality, normal quality, low quality, worst quality
42
+ - **Rating Tags:** safe, sensitive, nsfw, explicit
43
+ - **Date Tags:** newest, recent, mid, early, old
44
+
45
+ ## Training
46
+
47
+ ### Dataset Preparation
48
+ I used [hakubooru](https://github.com/KohakuBlueleaf/HakuBooru)-based custom scripts.
49
+
50
+ - **Exclude Tags:** `traditional_media, photo_(medium), scan, animated, animated_gif, lowres, non-web_source, variant_set, tall image, duplicate, pixel-perfect_duplicate`
51
+ - **Minimum Post ID:** 1,000,000
52
+
53
+ ### Blocks Reduction
54
+ I reduced the parameters by gradually removing blocks and retraining, so as not to completely disrupt the model's generation capability. Removing too many blocks at once results in completely noisy output, requiring retraining from scratch.
55
+
56
+ - **Training Hardware:** A single RTX 4090
57
+ - **Method:** LoRA training and merging the results
58
+ - **Training Script:** [sd-scripts](https://github.com/kohya-ss/sd-scripts)
59
+ - **Basic Settings:**
60
+ ```powershell
61
+ accelerate launch --num_cpu_threads_per_process 4 flux_train_network.py --network_module networks.lora_flux --sdpa --gradient_checkpointing --cache_latents --cache_latents_to_disk --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --max_data_loader_n_workers 1 --save_model_as "safetensors" --mixed_precision "bf16" --fp8_base --save_precision "bf16" --full_bf16 --min_bucket_reso 384 --max_bucket_reso 1536 --seed 1 --max_train_epochs 1 --keep_tokens_separator "|||" --network_dim 32 --network_alpha 32 --unet_lr 1e-4 --train_batch_size 2 --gradient_accumulation_steps 3 --optimizer_type adamw8bit --lr_scheduler="constant_with_warmup" --lr_warmup_steps 500 --vae_batch_size 4 --cache_info --guidance_scale 1 --timestep_sampling shift --model_prediction_type raw --discrete_flow_shift 1.8 --loss_type l2 --highvram --network_args "in_dims=[16,16,16,0,16]" --network_train_unet_only --bucket_no_upscale
62
+ ```
63
+
64
+ --Continued training from sumeshi flux.1s v002E--
65
+
66
+ 1. remove double blocks 18 / single blocks 34,35,36,37 / guidance_in (sumeshi flux.1s has zero tensor)
67
+
68
+ 2. 3,893images (res1024 bs2 /res512 bs8 acc2 warmup50 --lr_scheduler="cosine_with_restarts" --lr_scheduler_num_cycles 1 --discrete_flow_shift 1.8) 4epochs
69
+
70
+ 3. merged into model
71
+
72
+ 4. remove double blocks 16,17 / single blocks 32,33
73
+
74
+ 5. 3,893images (res1024 bs2 /res512 bs8 acc1 warmup50 --lr_scheduler="cosine_with_restarts" --lr_scheduler_num_cycles 2 --discrete_flow_shift 1.8) 2epochs
75
+
76
+ 6. merged into model
77
+
78
+ 7. remove double blocks 14,15 / single blocks 28,29,30,31
79
+
80
+ 8. 3,893images (res1024 bs2 /res512 bs8 acc2 warmup50 --lr_scheduler="cosine_with_restarts" --lr_scheduler_num_cycles 2 --discrete_flow_shift 1.8) 2epochs
81
+
82
+ 9. merged into model
83
+
84
+ 10. remove double blocks 12,13 / single blocks 24,25,26,27
85
+
86
+ 11. 3,893images (res1024 bs2 /res512 bs8 acc2 warmup50 --lr_scheduler="cosine_with_restarts" --lr_scheduler_num_cycles 2 --discrete_flow_shift 1.8) 2epochs
87
+
88
+ 12. merged into model
89
+
90
+ 13. 3,893images (Full-finetuned res1024 bs1 acc1 afafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" lr5e-6 warmup100 max_grad_norm 0.0 discrete_flow_shift 2) 2epochs
91
+
92
+ 14. 12,000images (res1024 bs2 acc5 warmup100 --discrete_flow_shift 2) 1epochs
93
+
94
+ 15. 12,000images (res1024 bs2 acc3 warmup100 --discrete_flow_shift 2) 2epochs
95
+
96
+ 16. merged into model
97
+
98
+ 17. 12,000images (res1024 bs2 acc3 warmup100 --discrete_flow_shift 2.5) 3epochs
99
+
100
+ 18. 12,000images (res1024 bs2 acc3 warmup100 --discrete_flow_shift 2.5) 3epochs
101
+
102
+ 19. 12,000images (res512 bs8 acc1 warmup100 timestep_sampling flux_shift(fixed ver.) ) 12epochs
103
+
104
+ 20. merged into model
105
+
106
+ 21. 12,000images (res512 bs4 acc1 warmup100 timesteps_sampling sigmoid sigmoid_scale 0.75) 7epochs
107
+
108
+ 22. 3,893images (res512 bs4 acc1 warmup100 timesteps_sampling sigmoid sigmoid_scale 0.6 --caption_dropout_rate 0.1) 16epochs
109
+
110
+ 23. merged into model
111
+
112
+ 24. 3,893images (res1024 bs2 / res512 bs4 acc2 warmup100 unet_lr1e-4 te_lr5e-5 timesteps_sampling sigmoid sigmoid_scale 0.6 --caption_dropout_rate 0.1) 6epochs
113
+
114
+ 25. merged into model and CLIP_L
115
+
116
+ ## Resources (License)
117
+ - **FLUX.1-schnell (Apache2.0)**
118
+ - **danbooru2023-webp-4Mpixel (MIT)**
119
+ - **danbooru2023-metadata-database (MIT)**
120
+
121
+ ## Acknowledgements
122
+ - **black-forest-labs:** Thanks for publishing a great open source model.
123
+ - **kohya-ss:** Thanks for publishing the essential training scripts and for the quick updates.
124
+ - **Kohaku-Blueleaf:** Thanks for the extensive publication of the scripts for the dataset and the various training conditions.