1 17

zhang

aczhang

AI & ML interests

None yet

Recent Activity

upvoted a paper 19 days ago

Towards Universal Soccer Video Understanding

upvoted a paper 19 days ago

MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities

upvoted a paper 23 days ago

Efficient Track Anything

View all activity

Organizations

None yet

aczhang's activity

upvoted 2 papers 19 days ago

Towards Universal Soccer Video Understanding

Paper • 2412.01820 • Published 23 days ago • 9

MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities

Paper • 2412.04106 • Published 21 days ago • 5

upvoted a paper 23 days ago

Efficient Track Anything

Paper • 2411.18933 • Published 28 days ago • 16

upvoted a paper 28 days ago

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

Paper • 2411.15296 • Published Nov 22 • 19

upvoted a paper about 1 month ago

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Paper • 2411.13281 • Published Nov 20 • 17

commented a paper about 2 months ago

RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval

Paper • 2411.04752 • Published Nov 7 • 16 •

upvoted a paper 4 months ago

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs

Paper • 2408.11813 • Published Aug 21 • 11

upvoted a paper 5 months ago

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19 • 42

upvoted a paper 6 months ago

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 68

upvoted a collection 6 months ago

InternVL2.0

Collection

Expanding Performance Boundaries of Open-Source MLLM • 15 items • Updated 4 days ago • 87

upvoted 2 papers 6 months ago

What Matters in Detecting AI-Generated Videos like Sora?

Paper • 2406.19568 • Published Jun 27 • 13

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

Paper • 2406.17770 • Published Jun 25 • 18

upvoted 2 papers 7 months ago

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published May 24 • 43

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 126

upvoted 2 papers 8 months ago

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25 • 35

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25 • 55

upvoted 2 papers 9 months ago

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

Paper • 2404.03118 • Published Apr 3 • 23

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4 • 33