-
Distilling Vision-Language Models on Millions of Videos
Paper • 2401.06129 • Published • 17 -
Koala: Key frame-conditioned long video-LLM
Paper • 2404.04346 • Published • 6 -
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Paper • 2404.05726 • Published • 21 -
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
Paper • 2406.07471 • Published • 1
liu
che111
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 1 hour ago
che111/mednlpr1
published
a dataset
about 1 hour ago
che111/mednlpr1
updated
a dataset
7 days ago
Sp-data/Report2CT
Organizations
Collections
8
-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 30 -
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Paper • 2405.15738 • Published • 44 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 99
models
None public yet