Flouga Droi

flofloga

AI & ML interests

None yet

Recent Activity

updated a Space 23 days ago

flofloga/rvc

liked a Space 6 months ago

mrfakename/E2-F5-TTS

updated a Space 11 months ago

flofloga/tts_rvc_polpol

View all activity

Organizations

None yet

flofloga's activity

updated a Space 23 days ago

Blank Space

🦀

liked a Space 6 months ago

2.15k

F5-TTS

🗣

F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)

updated 2 Spaces 11 months ago

TTS plus RVC

🎙

Convert text to speech with voice customization

Style Bert VITS2 Editor Demo

😊

replied to qgao007's post about 1 year ago

helo

reacted to vladbogo's post with 👍 about 1 year ago

Post

VideoPrism is a new video encoder that improves video understanding through a unique training strategy, using a vast dataset (36 million high-quality video-caption pairs and 582 million video clips) for comprehensive learning.

Key points:
* It employs a two-stage training approach, initially aligning video and text encoders, followed by an enhanced video-only masked autoencoding process to learn appearance and motion.
* It achieves superior performance in a wide array of tasks, such as general video understanding, zero-shot video-text retrieval, video captioning, QA, and computer vision for science, having top performance on 30 out of 33 benchmarks.

Congrats to the authors for their work!

Paper: VideoPrism: A Foundational Visual Encoder for Video Understanding (2402.13217)