8 18

Shengqiong Wu

ChocoWu

https://chocowu.github.io/

ChocoWu

AI & ML interests

Large Language Model, Multimodal learning, Scene graph Generation

Recent Activity

upvoted a paper 3 days ago

SemanticGen: Video Generation in Semantic Space

upvoted a paper 7 days ago

Kling-Omni Technical Report

upvoted a paper 12 days ago

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

View all activity

Organizations

upvoted a paper 3 days ago

SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published 3 days ago • 85

upvoted a paper 7 days ago

Kling-Omni Technical Report

Paper • 2512.16776 • Published 8 days ago • 156

upvoted a paper 12 days ago

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Paper • 2512.11749 • Published 14 days ago • 36

upvoted a paper about 1 month ago

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11 • 37

upvoted 3 papers 2 months ago

Latent Diffusion Model without Variational Autoencoder

Paper • 2510.15301 • Published Oct 17 • 49

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

Paper • 2510.13940 • Published Oct 15 • 6

AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes

Paper • 2510.10670 • Published Oct 12 • 18

upvoted 2 papers 3 months ago

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning

Paper • 2510.08555 • Published Oct 9 • 63

UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution

Paper • 2510.08143 • Published Oct 9 • 20

upvoted 3 papers 8 months ago

3D Scene Generation: A Survey

Paper • 2505.05474 • Published May 8 • 21

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7 • 82

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

Paper • 2504.13122 • Published Apr 17 • 20

upvoted 5 papers 9 months ago

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 301

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

Paper • 2503.23377 • Published Mar 30 • 57

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Paper • 2503.24379 • Published Mar 31 • 76

Position: Interactive Generative Video as Next-Generation Game Engine

Paper • 2503.17359 • Published Mar 21 • 61

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published Mar 16 • 35

upvoted a paper over 1 year ago

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54

Shengqiong Wu

AI & ML interests

Recent Activity

Organizations

ChocoWu's activity