Flouga Droi's picture
1

Flouga Droi

flofloga
ยท

AI & ML interests

None yet

Recent Activity

updated a Space 23 days ago
flofloga/rvc
liked a Space 6 months ago
mrfakename/E2-F5-TTS
updated a Space 11 months ago
flofloga/tts_rvc_polpol
View all activity

Organizations

None yet

flofloga's activity

updated a Space 23 days ago
replied to qgao007's post about 1 year ago
reacted to vladbogo's post with ๐Ÿ‘ about 1 year ago
view post
Post
VideoPrism is a new video encoder that improves video understanding through a unique training strategy, using a vast dataset (36 million high-quality video-caption pairs and 582 million video clips) for comprehensive learning.

Key points:
* It employs a two-stage training approach, initially aligning video and text encoders, followed by an enhanced video-only masked autoencoding process to learn appearance and motion.
* It achieves superior performance in a wide array of tasks, such as general video understanding, zero-shot video-text retrieval, video captioning, QA, and computer vision for science, having top performance on 30 out of 33 benchmarks.

Congrats to the authors for their work!

Paper: VideoPrism: A Foundational Visual Encoder for Video Understanding (2402.13217)