8 63 20

Zesen Cheng

ClownRat

AI & ML interests

multi-modal foundation model; Segmentation, Detection, and Tracking;

Recent Activity

upvoted a paper 19 days ago

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

authored a paper 19 days ago

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

upvoted a paper 24 days ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

View all activity

Organizations

ClownRat's activity

upvoted a paper 19 days ago

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

Paper • 2503.14428 • Published 26 days ago • 8

upvoted 2 papers 24 days ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 124

Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

Paper • 2410.18558 • Published Oct 24, 2024 • 20

upvoted a paper 29 days ago

Transformers without Normalization

Paper • 2503.10622 • Published about 1 month ago • 154

upvoted 2 papers about 1 month ago

LongRoPE2: Near-Lossless LLM Context Window Scaling

Paper • 2502.20082 • Published Feb 27 • 37

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 83

upvoted 2 articles about 2 months ago

Article

Mixture of Experts Explained

Dec 11, 2023

• 544

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

• 149

upvoted 3 papers about 2 months ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 180

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17 • 34

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Paper • 2502.13922 • Published Feb 19 • 25

upvoted 7 papers 2 months ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 113

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 380

Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

Paper • 2501.07888 • Published Jan 14 • 15

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 285

upvoted 2 papers 3 months ago

Valley2: Exploring Multimodal Models with Scalable Vision-Language Design

Paper • 2501.05901 • Published Jan 10 • 1

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28, 2024 • 88