Demo example in the paper

by willsky - opened 10 days ago

10 days ago

Great work!
May I ask how to get the results in Figure 4 and Figure 5 in the paper? I.e., retrieve the specific frames corresponding to the prompts.
Many thanks!

ynhe

OpenGVLab org 7 days ago

To achieve a more detail video understanding than conversations, you need to load third-party modules from TPO, which can be referred to https://huggingface.co/OpenGVLab/VideoChat-TPO/tree/main

willsky

7 days ago

It seems the the third-party modules in TPO are cgdetr and sam2. How should I proceed after loading these two modules?

ynhe

OpenGVLab org 7 days ago

After loading the corresponding task decoder, the model will identify whether the task decoder needs to be called and assist in giving the corresponding response.

willsky

7 days ago

•

edited 7 days ago

Could you give an example code to do this? I just want to get the specific frame number or specific time corresponding to the prompts. For example, "In this video, in which frames does a man appear?" "In this video, from which second to which second does a man appear?" Currently, the demo cannot output the right frames/seconds.

ynhe

OpenGVLab org 6 days ago

You can try this:

Based on the video content, Determine the start and end times of **various activity events** in the video, accompanied by descriptions.

willsky

6 days ago

I have tried this, but it cannot output the right time. For a video of 6 second, it outputs "25 to 30 seconds"

TokenMinnesotan

1 day ago

I have been trying to replicate this for a while with no luck. I tried running TPO but couldn't figure out how to load this model with it, I couldn't run the third-party modules alone with the new model either. Are there any updates on this? I want to understand how to get the third-party modules working with this model. If that's how the outputs were achieved in your paper, I would greatly appreciate any documentation or code on how to replicate those experiments.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment