18 22 22

Zhang Yuanhan

ZhangYuanhan

https://zhangyuanhan-ai.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 7 days ago

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

updated a collection 22 days ago

LMM RL

updated a collection 22 days ago

LMM RL

View all activity

Organizations

ZhangYuanhan's activity

upvoted a paper 7 days ago

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Paper • 2503.21755 • Published 8 days ago • 30

upvoted a paper 22 days ago

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

Paper • 2503.09590 • Published 23 days ago • 3

upvoted a paper 28 days ago

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published 30 days ago • 38

upvoted 2 papers about 2 months ago

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Paper • 2502.09621 • Published Feb 13 • 27

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Paper • 2502.05173 • Published Feb 7 • 64

upvoted 2 papers 2 months ago

Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models

Paper • 2501.14818 • Published Jan 20 • 4

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23 • 25

upvoted a paper 3 months ago

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published Jan 9 • 43

upvoted 2 papers 4 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 146

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 86

upvoted a paper 5 months ago

HourVideo: 1-Hour Video-Language Understanding

Paper • 2411.04998 • Published Nov 7, 2024 • 1

upvoted 3 papers 6 months ago

upvoted a collection 7 months ago

LLaVA-Video

Collection

Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 8 items • Updated Feb 21 • 61

upvoted a paper 8 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 60

upvoted 2 papers 9 months ago

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24, 2024 • 33

upvoted a paper about 1 year ago

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

Paper • 2402.07865 • Published Feb 12, 2024 • 15

upvoted a paper over 1 year ago

Aligning Large Multimodal Models with Factually Augmented RLHF

Paper • 2309.14525 • Published Sep 25, 2023 • 30