jimmycarter commited on
Commit
59ed1db
1 Parent(s): 79ba5f4

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -6,7 +6,7 @@ pipeline_tag: text-to-image
6
 
7
  # LibreFLUX: A free, de-distilled FLUX model
8
 
9
- LibreFLUX is an Apache 2.0 version of [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) that provides a full T5 context length, uses attention masking, has classifier free guidance restored, and has had most of the FLUX aesthetic finetuning/DPO fully removed. That means it's a lot uglier than base flux, but it has the potential to be more easily finetuned to any new distribution. It keeps in mind the core tenets of open source software, that it should be difficult to use, slower and clunkier than a proprietary solution, and have an aesthetic trapped somewhere inside the early 2000s.
10
 
11
  <img src="https://huggingface.co/jimmycarter/LibreFLUX/resolve/main/assets/splash.jpg" style="max-width: 100%;">
12
 
@@ -16,6 +16,8 @@ LibreFLUX is an Apache 2.0 version of [FLUX.1-schnell](https://huggingface.co/bl
16
 
17
  - [LibreFLUX: A free, de-distilled FLUX model](#libreflux-a-free-de-distilled-flux-model)
18
  - [Usage](#usage)
 
 
19
  - [Non-technical Report on Schnell De-distillation](#non-technical-report-on-schnell-de-distillation)
20
  - [Why](#why)
21
  - [Restoring the Original Training Objective](#restoring-the-original-training-objective)
@@ -33,6 +35,8 @@ LibreFLUX is an Apache 2.0 version of [FLUX.1-schnell](https://huggingface.co/bl
33
 
34
  # Usage
35
 
 
 
36
  To use the model, just call the custom pipeline using [diffusers](https://github.com/huggingface/diffusers).
37
 
38
  ```py
@@ -82,6 +86,10 @@ images[0][0].save('chalkboard.png')
82
 
83
  For usage in ComfyUI, [a single transformer file is provided](https://huggingface.co/jimmycarter/LibreFLUX/blob/main/transformer_legacy.safetensors) but note that ComfyUI does not presently support attention masks so your images may be degraded.
84
 
 
 
 
 
85
  # Non-technical Report on Schnell De-distillation
86
 
87
  Welcome to my non-technical report on de-distilling FLUX.1-schnell in the most un-scientific way possible with extremely limited resources. I'm not going to claim I made a good model, but I did make a model. It was trained on about 1,500 H100 hour equivalents.
@@ -118,7 +126,7 @@ Note that FLUX.1-schnell was only trained on 256 tokens, so my finetune allows u
118
 
119
  ## Make de-distillation go fast and fit in small GPUs
120
 
121
- I avoided doing any full-rank (normal, all parameters) finetuning at all, since FLUX is big. I trained initially with the model in int8 precision using [quanto](https://github.com/huggingface/optimum-quanto). I started with a 600 million parameter [LoKr](https://arxiv.org/abs/2309.14859), since LoKr tends to approximate full-rank finetuning better than LoRA. The loss was really slow to go down when I began, so after poking around the code to initialize the matrix to apply to the LoKr I settled on this function, which injects noise at a fraction of the magnitudes of the layers they apply to.
122
 
123
  ```py
124
  def approximate_normal_tensor(inp, target, scale=1.0):
 
6
 
7
  # LibreFLUX: A free, de-distilled FLUX model
8
 
9
+ LibreFLUX is an Apache 2.0 version of [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) that provides a full T5 context length, uses attention masking, has classifier free guidance restored, and has had most of the FLUX aesthetic fine-tuning/DPO fully removed. That means it's a lot uglier than base flux, but it has the potential to be more easily finetuned to any new distribution. It keeps in mind the core tenets of open source software, that it should be difficult to use, slower and clunkier than a proprietary solution, and have an aesthetic trapped somewhere inside the early 2000s.
10
 
11
  <img src="https://huggingface.co/jimmycarter/LibreFLUX/resolve/main/assets/splash.jpg" style="max-width: 100%;">
12
 
 
16
 
17
  - [LibreFLUX: A free, de-distilled FLUX model](#libreflux-a-free-de-distilled-flux-model)
18
  - [Usage](#usage)
19
+ - [Inference](#inference)
20
+ - [Fine-tuning](#fine-tuning)
21
  - [Non-technical Report on Schnell De-distillation](#non-technical-report-on-schnell-de-distillation)
22
  - [Why](#why)
23
  - [Restoring the Original Training Objective](#restoring-the-original-training-objective)
 
35
 
36
  # Usage
37
 
38
+ ## Inference
39
+
40
  To use the model, just call the custom pipeline using [diffusers](https://github.com/huggingface/diffusers).
41
 
42
  ```py
 
86
 
87
  For usage in ComfyUI, [a single transformer file is provided](https://huggingface.co/jimmycarter/LibreFLUX/blob/main/transformer_legacy.safetensors) but note that ComfyUI does not presently support attention masks so your images may be degraded.
88
 
89
+ ## Fine-tuning
90
+
91
+ The model can be easily finetuned using [SimpleTuner](https://github.com/bghira/SimpleTuner) and the `--flux_attention_masked_training` training option. SimpleTuner has extensive support for parameter-efficient fine-tuning via [LyCORIS](https://github.com/KohakuBlueleaf/LyCORIS), in addition to full-rank fine-tuning.
92
+
93
  # Non-technical Report on Schnell De-distillation
94
 
95
  Welcome to my non-technical report on de-distilling FLUX.1-schnell in the most un-scientific way possible with extremely limited resources. I'm not going to claim I made a good model, but I did make a model. It was trained on about 1,500 H100 hour equivalents.
 
126
 
127
  ## Make de-distillation go fast and fit in small GPUs
128
 
129
+ I avoided doing any full-rank (normal, all parameters) fine-tuning at all, since FLUX is big. I trained initially with the model in int8 precision using [quanto](https://github.com/huggingface/optimum-quanto). I started with a 600 million parameter [LoKr](https://arxiv.org/abs/2309.14859), since LoKr tends to approximate full-rank fine-tuning better than LoRA. The loss was really slow to go down when I began, so after poking around the code to initialize the matrix to apply to the LoKr I settled on this function, which injects noise at a fraction of the magnitudes of the layers they apply to.
130
 
131
  ```py
132
  def approximate_normal_tensor(inp, target, scale=1.0):