MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated Nov 27, 2024 • 101
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21, 2024 • 44
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4, 2024 • 92
Jamba-1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated Aug 22, 2024 • 84
Enhance Your Images Collection Some trending Gradio apps on Spaces that you can use to enhance/upscale your images for free. This collection will be kept uptodate with new releases. • 7 items • Updated Aug 22, 2024 • 17
Kalman-Inspired Feature Propagation for Video Face Super-Resolution Paper • 2408.05205 • Published Aug 9, 2024 • 8
Qwen2-Audio Collection Audio-language model series based on Qwen2 • 4 items • Updated Nov 28, 2024 • 49
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published Apr 25, 2024 • 55
Text-to-Image Base Models Collection All text-to-image open source base models, with their respective license • 28 items • Updated May 10, 2024 • 22
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models Paper • 2401.05252 • Published Jan 10, 2024 • 47
PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns Paper • 2312.04534 • Published Dec 7, 2023 • 6
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis Paper • 2311.08667 • Published Nov 15, 2023 • 18
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning Paper • 2311.07574 • Published Nov 13, 2023 • 14
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V Paper • 2310.11441 • Published Oct 17, 2023 • 26
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper • 2311.00571 • Published Nov 1, 2023 • 41
CLAP: Contrastive Language-Audio Pretraining Collection CLAP is to audio what CLIP is to image. • 5 items • Updated Oct 31, 2023 • 8
ProPainter: Improving Propagation and Transformer for Video Inpainting Paper • 2309.03897 • Published Sep 7, 2023 • 26
Open LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 60 items • Updated 30 minutes ago • 504
PaLI-3 Vision Language Models: Smaller, Faster, Stronger Paper • 2310.09199 • Published Oct 13, 2023 • 26