Collections
Discover the best community collections!
Collections trending this week
-
LMDX: Language Model-based Document Information Extraction and Localization
Paper β’ 2309.10952 β’ Published β’ 65 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper β’ 2309.01131 β’ Published β’ 1 -
On the Hidden Mystery of OCR in Large Multimodal Models
Paper β’ 2305.07895 β’ Published -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper β’ 2401.00908 β’ Published β’ 181
-
277
Video Dubbing
πDub and voice over videos in multiple languages
-
A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications
Paper β’ 2310.17750 β’ Published β’ 9 -
4.75k
MusicGen
π΅Generate music from text and melody descriptions
-
245
Faster Whisper Webui
πTranscribe audio to text with speaker diarization