Joya Chen's picture

Joya Chen

chenjoya

·

https://chenjoya.github.io/

chenjoya

AI & ML interests

Video LLM

Recent Activity

upvoted a paper 3 days ago

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

upvoted a paper 9 days ago

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

upvoted a paper about 1 month ago

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

View all activity

Organizations

upvoted a paper 3 days ago

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Paper • 2606.11176 • Published 6 days ago • 41

upvoted a paper 9 days ago

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

Paper • 2606.04811 • Published 11 days ago • 16

upvoted a paper about 1 month ago

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Paper • 2605.13724 • Published May 13 • 101

liked a model about 1 month ago

nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers

Text-to-Video • Updated 22 days ago • 143 • 11

updated a dataset about 1 month ago

DataTransfer111/marker

Updated May 14 • 134

upvoted 4 papers 3 months ago

Mixture-of-Depths Attention

Paper • 2603.15619 • Published Mar 16 • 81

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Paper • 2603.15557 • Published Mar 16 • 29

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Paper • 2603.03276 • Published Mar 3 • 106

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published Mar 6 • 120

upvoted 3 papers 4 months ago

Olaf-World: Orienting Latent Actions for Video World Modeling

Paper • 2602.10104 • Published Feb 10 • 27

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Paper • 2602.02493 • Published Feb 2 • 46

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Paper • 2602.03796 • Published Feb 3 • 65

upvoted 2 papers 5 months ago

FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

Paper • 2601.03928 • Published Jan 7 • 16

ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

Paper • 2512.24965 • Published Dec 31, 2025 • 43

upvoted 4 papers 6 months ago

EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models

Paper • 2512.14666 • Published Dec 16, 2025 • 10

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 222

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Paper • 2512.13281 • Published Dec 15, 2025 • 65

Glance: Accelerating Diffusion Models with 1 Sample

Paper • 2512.02899 • Published Dec 2, 2025 • 30

upvoted 2 papers 7 months ago

The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation

Paper • 2511.20256 • Published Nov 25, 2025 • 28

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 116