HawkEye: Training Video-Text LLMs for Grounding Text in Videos

This repo provides the checkpoint of HawkEye, and our implementation of VideoChat2.
videochat2-stage3-our_impl.pth
is the chekepoint of our reproduce of VideoChat2. You can use it as an substitution of hawkeye.pth
.
- The difference between it and HawkEye is: not trained with data from InternVid-G.
- The difference between it and the original implementation of VideoChat2 is: the visual encoder is frozen, and not trained with image data from VideoChat2-IT
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.