-
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 53 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 101 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 130 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 52
Danil
Potatochka
AI & ML interests
None yet
Recent Activity
liked
a Space
23 days ago
osunlp/Online_Mind2Web_Leaderboard
updated
a collection
about 1 month ago
VLM papers
Organizations
None yet
Collections
2
-
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 90 -
Transformer Explainer: Interactive Learning of Text-Generative Models
Paper • 2408.04619 • Published • 163 -
Addition is All You Need for Energy-efficient Language Models
Paper • 2410.00907 • Published • 150 -
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Paper • 2305.10429 • Published • 3
models
None public yet
datasets
None public yet