Zhaokai Wang's picture

Zhaokai Wang

wzk1015

·

https://www.wzk.plus

wzk1015

AI & ML interests

Computer Vision Music Generation Multimodal Large Language Models

Recent Activity

upvoted a paper 1 day ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

updated a model 1 day ago

OpenGVLab/PIIP-LLaVA-Plus_ConvNeXt-L_CLIP-L_1024-336_7B

updated a model 1 day ago

OpenGVLab/clip-vit-large-patch14to16-224

View all activity

Organizations

wzk1015's activity

upvoted a paper 1 day ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published 4 days ago • 71

upvoted a collection 1 day ago

PIIP

[NeurIPS 2024 Spotlight ] Parameter-Inverted Image Pyramid Networks • 11 items • Updated 1 day ago • 1

upvoted a paper 18 days ago

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Paper • 2504.02826 • Published 19 days ago • 67

upvoted a paper about 1 month ago

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning

Paper • 2503.11646 • Published Mar 14 • 35

upvoted a paper about 2 months ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25 • 73

upvoted a paper 3 months ago

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Paper • 2501.07783 • Published Jan 14 • 7

upvoted 3 papers 4 months ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 38

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Paper • 2412.09428 • Published Dec 12, 2024 • 7

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 155

upvoted a paper 5 months ago

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Paper • 2410.08202 • Published Oct 10, 2024 • 4

upvoted a collection 5 months ago

InternVL2.5

Better than InternVL 2.0 • 19 items • Updated 2 days ago • 92

upvoted a collection 6 months ago

Mono-InternVL

A Pioneering Monolithic MLLM • 6 items • Updated 2 days ago • 6

upvoted a paper 9 months ago

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Paper • 2407.08770 • Published Jul 11, 2024 • 21

upvoted a collection 11 months ago

InternVL1.0

Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks • 16 items • Updated 2 days ago • 18