Zhe Chen's picture

Zhe Chen

czczup

·

https://scholar.google.com/citations?hl=en&user=j1rq_lYAAAAJ

czczup

AI & ML interests

multimodal large language model, vision foundation model

Recent Activity

liked a dataset about 11 hours ago

hiyouga/geometry3k

updated a collection 9 days ago

View all activity

Organizations

czczup's activity

upvoted a collection 19 days ago

VisualPRM

5 items • Updated 19 days ago • 2

upvoted a paper 21 days ago

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published 23 days ago • 32

upvoted a paper 26 days ago

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 57

upvoted a paper 29 days ago

PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization

Paper • 2503.01328 • Published Mar 3 • 14

upvoted a collection about 2 months ago

SYNTHETIC-1

A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Feb 20 • 50

upvoted a paper 3 months ago

FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

Paper • 2111.02394 • Published Nov 3, 2021 • 2

upvoted 7 collections 4 months ago

VideoChat

Chat-Centric Video Understanding • 8 items • Updated Jan 10 • 3

V2PE

Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding • 3 items • Updated Jan 10 • 3

InternVL Adaptation

Adaptation Models for Specific Domains • 12 items • Updated Jan 10 • 1

InternVideo2

InternVideo2 • 20 items • Updated Feb 27 • 19

InternVL1.5

A Pioneering Open-Source Alternative to GPT-4V • 8 items • Updated Jan 10 • 12

Mono-InternVL

A Pioneering Monolithic MLLM • 6 items • Updated 23 days ago • 6

InternVL2.5-MPO

Enhancing the Reasoning Ability of MLLMs via Mixed Preference Optimization • 16 items • Updated Jan 29 • 26

upvoted 2 papers 4 months ago

POINTS1.5: Building a Vision-Language Model towards Real World Applications

Paper • 2412.08443 • Published Dec 11, 2024 • 38

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 37

upvoted a collection 4 months ago

InternLM2.5

14 items • Updated Feb 11 • 71

upvoted 2 papers 4 months ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 151

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 80

upvoted a paper 5 months ago

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Paper • 2410.08202 • Published Oct 10, 2024 • 4

upvoted a collection 5 months ago

InternVL2.5

Better than InternVL 2.0 • 19 items • Updated Mar 3 • 90