-
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 51 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 98 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 124 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 51
Danil
Potatochka
AI & ML interests
None yet
Recent Activity
updated
a collection
about 18 hours ago
VLM papers
updated
a collection
about 18 hours ago
VLM papers
updated
a collection
about 18 hours ago
VLM papers
Organizations
None yet
Collections
2
-
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 89 -
Transformer Explainer: Interactive Learning of Text-Generative Models
Paper • 2408.04619 • Published • 156 -
Addition is All You Need for Energy-efficient Language Models
Paper • 2410.00907 • Published • 145 -
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Paper • 2305.10429 • Published • 3
models
None public yet
datasets
None public yet