Apply for community grant: Personal project (gpu)

#1
by BestWishYsh - opened

Apply for community grant: Academic project (gpu). We develop an identity-preserving text-to-video generation model, ConsisID, which can keep human-identity consistent in the generated video.

arxiv: https://arxiv.org/abs/2411.17440
paper: https://huggingface.co/papers/2411.17440
page: https://pku-yuangroup.github.io/ConsisID/
code: https://github.com/PKU-YuanGroup/ConsisID

Hi @BestWishYsh , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

thanks, let me check

BTW, it would be nice if you could provide more info, like arXiv link, GitHub URL, etc., when opening a grant request. Your request was tagged as a spam as it didn't have meaningful info.

i am sorry, thanks for your suggestion

@hysts hi, I have set @spaces.GPU(), but it still shows RuntimeError: No CUDA GPUs are available. Do you know how to fix it, thanks.

Can you provide more info about the error, like a stack trace?

Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 135, in worker_init
torch.init(nvidia_uuid)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 354, in init
torch.Tensor([0]).cuda()
File "/usr/local/lib/python3.10/site-packages/torch/cuda/init.py", line 314, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 214, in gradio_handler
raise res.value
RuntimeError: No CUDA GPUs are available
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/asyncio.py", line 943, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 184, in gradio_handler
schedule_response = client.schedule(task_id=task_id, request=request, duration=duration
)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/client.py", line 119, in schedule
raise gr.Error(
gradio.exceptions.Error: 'The requested GPU duration (240s) is larger than the maximum allowed'

Is duration=240s too long?

Thanks! I've been looking into the issue, but so far haven't been able to resolve it. The fact that this Space takes like 10 minutes or so just to restart is making debugging difficult. Anyway, I'll let you know once I find something.

Well, not 100% sure what is the cause of the error, but looks like this diff fixes the CUDA error:

diff --git a/app.py b/app.py
index e119d2b..60bb4de 100644
--- a/app.py
+++ b/app.py
@@ -122,7 +122,6 @@ def infer(
     num_inference_steps: int,
     guidance_scale: float,
     seed: int = 42,
-    progress=gr.Progress(track_tqdm=True),
 ):
     if seed == -1:
         seed = random.randint(0, 2**8 - 1)
@@ -170,6 +169,38 @@ def infer(
     return (video_pt, seed)
 
 
[email protected](duration=180)
+def generate(
+    prompt,
+    image_input,
+    seed_value,
+    scale_status,
+    rife_status,
+):
+    latents, seed = infer(
+        prompt,
+        image_input,
+        num_inference_steps=50,
+        guidance_scale=7.0,
+        seed=seed_value,
+    )
+    if scale_status:
+        latents = upscale_batch_and_concatenate(upscale_model, latents, device)
+    if rife_status:
+        latents = rife_inference_with_latents(frame_interpolation_model, latents)
+
+    batch_size = latents.shape[0]
+    batch_video_frames = []
+    for batch_idx in range(batch_size):
+        pt_image = latents[batch_idx]
+        pt_image = torch.stack([pt_image[i] for i in range(pt_image.shape[0])])
+
+        image_np = VaeImageProcessor.pt_to_numpy(pt_image)
+        image_pil = VaeImageProcessor.numpy_to_pil(image_np)
+        batch_video_frames.append(image_pil)
+    return batch_video_frames
+
+
 def convert_to_gif(video_path):
     clip = VideoFileClip(video_path)
     gif_path = video_path.replace(".mp4", ".gif")
@@ -320,8 +351,8 @@ with gr.Blocks() as demo:
     </table>
         """)
 
-    @spaces.GPU(duration=180)
-    def generate(
+
+    def run(
         prompt,
         image_input,
         seed_value,
@@ -329,29 +360,11 @@ with gr.Blocks() as demo:
         rife_status,
         progress=gr.Progress(track_tqdm=True)
     ):
-        latents, seed = infer(
-            prompt,
-            image_input,
-            num_inference_steps=50,
-            guidance_scale=7.0,
-            seed=seed_value,
-            progress=progress,
-        )
-        if scale_status:
-            latents = upscale_batch_and_concatenate(upscale_model, latents, device)
-        if rife_status:
-            latents = rife_inference_with_latents(frame_interpolation_model, latents)
-
-        batch_size = latents.shape[0]
-        batch_video_frames = []
-        for batch_idx in range(batch_size):
-            pt_image = latents[batch_idx]
-            pt_image = torch.stack([pt_image[i] for i in range(pt_image.shape[0])])
-
-            image_np = VaeImageProcessor.pt_to_numpy(pt_image)
-            image_pil = VaeImageProcessor.numpy_to_pil(image_np)
-            batch_video_frames.append(image_pil)
-
+        batch_video_frames = generate(prompt,
+                                      image_input,
+                                      seed_value,
+                                      scale_status,
+                                      rife_status)
         video_path = save_video(batch_video_frames[0], fps=math.ceil((len(batch_video_frames[0]) - 1) / 6))
         video_update = gr.update(visible=True, value=video_path)
         gif_path = convert_to_gif(video_path)
@@ -361,7 +374,7 @@ with gr.Blocks() as demo:
         return video_path, video_update, gif_update, seed_update
 
     generate_button.click(
-        generate,
+        fn=run,
         inputs=[prompt, image_input, seed_param, enable_scale, enable_rife],
         outputs=[video_output, download_video_button, download_gif_button, seed_text],
     )

This should resolve the current error, but when I run the Space, the inference took more than the specified duration, which is 180 seconds, and the "GPU task aborted error" occurred. Maybe you can decrease the number of inference steps to avoid the error, though.

Thanks a lot, but is the maximum time of ZeroGPU only 180s? Reducing the step will seriously reduce the generated quality ...

Well, actually, I don't know about it. cc @cbensimon

It seems that the longest duration is only 120s

https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/118

But your Space doesn't seem to raise the maximum duration error with duration=180.

The discussion is a bit old and the bug that showed -1 day or something has already been fixed, so I'm not sure but maybe the 120 second issue in the discussion might have been fixed as well.

The current error is:
image.png

Does this error mean that users need to subscribe to enterprise hub for it to run the hf space? But it seems that other spaces using zerogpu do not need ...

Ah, ok, it involves a couple of known issues of spaces.

  1. There's a bug where the quota error is raised when the remaining quota is exactly the same as the specified duration.
  2. There's a bug where users are considered not logged in when the @spaces.GPU decorator is added to inner functions.

You can avoid the second issue by adding an attribute to the outer function, like wrapper_fn.zergpu = True.
In your case, adding the following line after the definition of the run function would fix the second issue.

run.zerogpu = True

If I remember correctly, logged-in users have 300 seconds of quota, so they don't have to subscribe to PRO.

Thanks, let me try it

I have added "run.zerogpu = True", but still this error

The current error is:
image.png

Hmm, I see. Thanks for checking. This is the way @cbensimon told me before and it worked back then, but something might have been changed since then. Let's wait for @cbensimon 's response.

Thank you for your great support, looking forward to his reply.

PS: After I restarted the space, "No CUDA GPUs are available" appeared again ...
image.png

Oh, weird.
I think it would be great if this Space can run on ZeroGPU, but maybe we can assign L40S with 1 hour sleep time in the meantime if your Space can run on it.

OK, I just switched the hardware to L40S to see if it works.

thanks a lot, let's wait for it to restart. And should we add "run.zerogpu = True" when using L40S?

And should we add "run.zerogpu = True" when using L40S?

No, it should work without it. I doesn't affect normal GPU execution.
Also, spaces does nothing on Spaces with normal GPU, so you don't have to remove the @spaces.GPU decorator either.

The space runs normally (with L40S). It seems that ZeroGPU has some hidden bugs.

Looks like CUDA OOM occurred.

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/user/app/app.py", line 350, in run
    batch_video_frames, seed = generate(
  File "/home/user/app/app.py", line 156, in generate
    video_pt = pipe(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/app/models/pipeline_consisid.py", line 883, in __call__
    video = self.decode_latents(latents)
  File "/home/user/app/models/pipeline_consisid.py", line 463, in decode_latents
    frames = self.vae.decode(latents).sample
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1278, in decode
    decoded = self._decode(z).sample
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1249, in _decode
    z_intermediate, conv_cache = self.decoder(z_intermediate, conv_cache=conv_cache)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 970, in forward
    hidden_states, new_conv_cache[conv_cache_key] = up_block(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 647, in forward
    hidden_states, new_conv_cache[conv_cache_key] = resnet(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 291, in forward
    hidden_states, new_conv_cache["norm1"] = self.norm1(hidden_states, zq, conv_cache=conv_cache.get("norm1"))
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 187, in forward
    new_f = norm_f * conv_y + conv_b
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.32 GiB. GPU 0 has a total capacity of 44.32 GiB of which 181.25 MiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 35.02 GiB is allocated by PyTorch, and 7.06 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

oh, let's try again by adding thes two lines
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()

the space can generate videos normally now, and it seems that only about 19G of gpu memory is needed

Awesome!
Also, thanks for your efforts in debugging the ZeroGPU issues.

Ah, ok, it involves a couple of known issues of spaces.

  1. There's a bug where the quota error is raised when the remaining quota is exactly the same as the specified duration.
  2. There's a bug where users are considered not logged in when the @spaces.GPU decorator is added to inner functions.

You can avoid the second issue by adding an attribute to the outer function, like wrapper_fn.zergpu = True.
In your case, adding the following line after the definition of the run function would fix the second issue.

run.zerogpu = True

If I remember correctly, logged-in users have 300 seconds of quota, so they don't have to subscribe to PRO.

Hi @hysts ,
Is the 2ns bug mentioned here solved now? Because i'm facing exactly this situation with my space https://huggingface.co/spaces/filapro/cad-recode

Hi @filapro
No, unfortunately, it hasn’t been fixed yet. I saw some discussions about it internally a few days ago, so as far as I know, it’s still being worked on.
As I mentioned above, you should be able to work around the issue by adding something like wrapper_fn.zerogpu = True to the wrapper function.
I just duplicated your Space and tested, but adding run_test_safe.zerogpu = True fixes the issue for your Space. Can you try that?

Hi @hysts ,
According to your answer I refactored my code, to have less nested functions. Now everything is fine, thanks!

Sign up or log in to comment