16 20 10

Dongzhi Jiang

CaraJ

https://github.com/CaraJ7

AI & ML interests

None yet

Recent Activity

updated a dataset 19 days ago

CaraJ/MME-CoT

liked a model 21 days ago

ZiyuG/Image-Generation-CoT

upvoted a paper 24 days ago

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

View all activity

Organizations

CaraJ's activity

upvoted a paper 24 days ago

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published 25 days ago • 48

upvoted 2 papers about 2 months ago

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Paper • 2502.09621 • Published Feb 13 • 27

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 135

upvoted 2 papers 2 months ago

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

Paper • 2501.13920 • Published Jan 23 • 17

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Paper • 2501.13926 • Published Jan 23 • 42

upvoted 4 papers 4 months ago

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Paper • 2412.11279 • Published Dec 15, 2024 • 12

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Paper • 2412.09618 • Published Dec 12, 2024 • 21

Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

Paper • 2412.06781 • Published Dec 9, 2024 • 21

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 111

upvoted a paper 5 months ago

Constraint Back-translation Improves Complex Instruction Following of Large Language Models

Paper • 2410.24175 • Published Oct 31, 2024 • 18

upvoted a paper 6 months ago

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published Oct 17, 2024 • 56

upvoted 2 papers 7 months ago

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Paper • 2409.12959 • Published Sep 19, 2024 • 37

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Paper • 2408.16768 • Published Aug 29, 2024 • 28

upvoted 2 papers 9 months ago

MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11, 2024 • 33

Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning

Paper • 2407.00782 • Published Jun 30, 2024 • 25

upvoted 2 papers 10 months ago

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Paper • 2406.11831 • Published Jun 17, 2024 • 22

Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 48

upvoted 2 papers about 1 year ago

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4, 2024 • 36

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Paper • 2403.14624 • Published Mar 21, 2024 • 52

upvoted a paper over 1 year ago

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Paper • 2308.07921 • Published Aug 15, 2023 • 23