-
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Paper • 2309.15915 • Published • 2 -
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Paper • 2310.00653 • Published • 3 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 7 -
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Paper • 2309.09958 • Published • 18
Zhao
Hanyu66
AI & ML interests
CV, NLP
Recent Activity
liked
a dataset
about 1 month ago
dylanebert/CityGaussian
liked
a dataset
about 1 month ago
ShapeNet/ShapeSplatsV1
liked
a model
about 1 month ago
naver/DUSt3R_ViTLarge_BaseDecoder_512_dpt
Organizations
None yet
Collections
1
models
1
datasets
None public yet