image.show
The image does not appear or save it.
It only works as a Python file.
python example.py
in colab t4
import torch
from model import T5EncoderModel, FluxTransformer2DModel
from diffusers import FluxPipeline
from IPython.display import display
text_encoder_2: T5EncoderModel = T5EncoderModel.from_pretrained(
"HighCWu/FLUX.1-dev-4bit",
subfolder="text_encoder_2",
torch_dtype=torch.bfloat16,
# hqq_4bit_compute_dtype=torch.float32,
)
transformer: FluxTransformer2DModel = FluxTransformer2DModel.from_pretrained(
"HighCWu/FLUX.1-dev-4bit",
subfolder="transformer",
torch_dtype=torch.bfloat16,
)
pipe: FluxPipeline = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
text_encoder_2=text_encoder_2,
transformer=transformer,
torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload() # with cpu offload, it cost 8.5GB vram
pipe.remove_all_hooks()
pipe = pipe.to('cuda') # without cpu offload, it cost 11GB vram
A trick to free gpu vram
def clean_hook(module, args, *rest_args):
torch.cuda.synchronize()
torch.cuda.empty_cache()
hook1 = transformer.register_forward_pre_hook(clean_hook)
hook2 = transformer.register_forward_hook(clean_hook)
def cpu_offload_hook(module, args): # use cpu to decode latents
transformer.cpu()
torch.cuda.synchronize()
torch.cuda.empty_cache()
hook3 = pipe.vae.decoder.register_forward_pre_hook(cpu_offload_hook)
prompt = "realistic, best quality, extremely detailed, ray tracing, photorealistic, A blue cat holding a sign that says hello world"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=3.5,
output_type="pil",
num_inference_steps=16,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
display(image)
hook1.remove()
hook2.remove()
hook3.remove()
transformer.cuda()
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
low_cpu_mem_usage
was None, now default to True since model is quantized.
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret HF_TOKEN
does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Loading pipeline components...: 100%
7/7 [00:01<00:00, 4.48it/s]
You set add_prefix_space
. The tokenizer needs to be converted from the slow tokenizers
0%
0/16 [00:00<?, ?it/s]
OutOfMemoryError Traceback (most recent call last)
in <cell line: 46>()
44
45 prompt = "realistic, best quality, extremely detailed, ray tracing, photorealistic, A blue cat holding a sign that says hello world"
---> 46 image = pipe(
47 prompt,
48 height=1024,
12 frames
/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py in call(self, attn, hidden_states, encoder_hidden_states, attention_mask, image_rotary_emb)
1779 key = apply_rotary_emb(key, image_rotary_emb)
1780
-> 1781 hidden_states = F.scaled_dot_product_attention(query, key, value, dropout_p=0.0, is_causal=False)
1782 hidden_states = hidden_states.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim)
1783 hidden_states = hidden_states.to(query.dtype)
OutOfMemoryError: CUDA out of memory. Tried to allocate 1.90 GiB. GPU 0 has a total capacity of 14.75 GiB of which 917.06 MiB is free. Process 46063 has 13.85 GiB memory in use. Of the allocated memory 13.52 GiB is allocated by PyTorch, and 213.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
/content/flux-4bit
2024-11-17 19:33:31.910717: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-17 19:33:31.937523: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-17 19:33:31.945632: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-17 19:33:31.975725: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-17 19:33:33.438630: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.low_cpu_mem_usage
was None, now default to True since model is quantized.
Loading pipeline components...: 57% 4/7 [00:00<00:00, 9.81it/s]You set add_prefix_space
. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100% 7/7 [00:01<00:00, 6.58it/s]
0% 0/16 [00:18<?, ?it/s]
Traceback (most recent call last):
File "/content/flux-4bit/example.py", line 31, in
image = pipe(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/flux/pipeline_flux.py", line 730, in call
noise_pred = self.transformer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/transformer_flux.py", line 544, in forward
hidden_states = block(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/transformer_flux.py", line 92, in forward
attn_output = self.attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 495, in forward
return self.processor(
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 1781, in call
hidden_states = F.scaled_dot_product_attention(query, key, value, dropout_p=0.0, is_causal=False)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.90 GiB. GPU 0 has a total capacity of 14.75 GiB of which 1.69 GiB is free. Process 58509 has 13.05 GiB memory in use. Of the allocated memory 10.80 GiB is allocated by PyTorch, and 2.13 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variabl
%cd /content/flux-4bit
!python a.py
/content/flux-4bit
2024-11-17 19:40:17.986793: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-17 19:40:18.014036: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-17 19:40:18.023423: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-17 19:40:18.052484: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-17 19:40:19.815512: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.low_cpu_mem_usage
was None, now default to True since model is quantized.
Loading pipeline components...: 0% 0/7 [00:00<?, ?it/s]You set add_prefix_space
. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100% 7/7 [00:01<00:00, 6.03it/s]
100% 3/3 [00:30<00:00, 10.16s/it]
^C
No pictures saved or displayed
colab t4
Your session crashed after using all available RAM.
colab t4
It won't save or show pictures no matter how hard I try.
؟؟؟؟؟؟
My code can run normally in colab t4 in the last experiment, but it takes a long time. My suggestion is that 16GB of video memory can better support this model