Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.11703

Papers - Multimodal

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22 • 19
ImageBind: One Embedding Space To Bind Them All

Paper • 2305.05665 • Published May 9, 2023 • 3
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 178
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Paper • 2206.02770 • Published Jun 6, 2022 • 3

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5 • 16
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

Paper • 2403.03003 • Published Mar 5 • 9
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Paper • 2403.01487 • Published Mar 3 • 15
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 44

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Paper • 2403.00483 • Published Mar 1 • 11
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4 • 27
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21 • 20
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19 • 48

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs