Have a video chat with Gemini - it can see you ⚡️
Dense Grounded Understanding of Images and Videos
FitDiT is a high-fidelity virtual try-on model.
https://huggingface.co/papers/2501.03006
Video Super-Resolution with Text-to-Video Model
Chat with LLMs
Gaze Target Estimation
Gaze detection using Moondream
Search for organizations
A leaderboard for multimodal models
Audio Conditioned LipSync with Latent Diffusion Models
FoundHand
EfficientVLM
Animation Sketches sequence Colorization