new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Mar 14

Submitted by

zhoutianyi

CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

·
4 authors

7

Submitted by

sinwang

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning

·
7 authors

5

Submitted by

agwmon

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

·
5 authors

1

Submitted by

Eliahu

Charting and Navigating Hugging Face's Model Atlas

·
5 authors

1

Submitted by

akhaliq

Transformers without Normalization

·
5 authors

Submitted by

Owen777

CoRe^2: Collect, Reflect and Refine to Generate Better and Faster

·
7 authors

Submitted by

LucasFang

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

·
12 authors

1

Submitted by

wondervictor

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

·
10 authors

1

Submitted by

ChenyangLyu

New Trends for Modern Machine Translation with Large Reasoning Models

·
6 authors

1

Submitted by

mozhu

Shifting Long-Context LLMs Research from Input to Output

·
7 authors

1

Submitted by

wenhu

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

·
7 authors

1

Submitted by

yyf86

DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation

·
9 authors

Submitted by

VityaVitalich

Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark

·
6 authors

1

Submitted by

akhaliq

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

·
32 authors

Submitted by

akhaliq

Long Context Tuning for Video Generation

·
8 authors

Submitted by

EthanTaylor

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

·
8 authors

1

Submitted by

sayakpaul

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

·
9 authors

1

Submitted by

xuxw98

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

·
6 authors

1

Submitted by

akhaliq

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

·
14 authors

Submitted by

BestWishYsh

CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance

·
10 authors

1

Submitted by

allisonandreyev

Quantization for OpenAI's Whisper Models: A Comparative Analysis

·
1 authors

1

Submitted by

RohitGandikota

Distilling Diversity and Control in Diffusion Models

·
2 authors

1

Submitted by

akhaliq

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

·
12 authors

Submitted by

hp-l33

Autoregressive Image Generation with Randomized Parallel Decoding

·
4 authors

1

Submitted by

hkchengrex

The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation

·
2 authors

1

Submitted by

Weiyun1025

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

·
15 authors

1

Submitted by

imranraad

"Silent Is Not Actually Silent": An Investigation of Toxicity on Bug Report Discussion

·
2 authors

1

Submitted by

chenblin26

ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

·
6 authors

1

Submitted by

Nikolai10

PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling

·
6 authors

1

Submitted by

AhmadMustafa

On the Limitations of Vision-Language Models in Understanding Image Transforms

·
3 authors