Papers
arxiv:2412.12877

MIVE: New Design and Benchmark for Multi-Instance Video Editing

Published on Dec 17, 2024
ยท Submitted by ozbro on Dec 18, 2024
Authors:
,

Abstract

Recent AI-based video editing has enabled users to edit videos through simple text prompts, significantly simplifying the editing process. However, recent zero-shot video editing techniques primarily focus on global or single-object edits, which can lead to unintended changes in other parts of the video. When multiple objects require localized edits, existing methods face challenges, such as unfaithful editing, editing leakage, and lack of suitable evaluation datasets and metrics. To overcome these limitations, we propose a zero-shot Multi-Instance Video Editing framework, called MIVE. MIVE is a general-purpose mask-based framework, not dedicated to specific objects (e.g., people). MIVE introduces two key modules: (i) Disentangled Multi-instance Sampling (DMS) to prevent editing leakage and (ii) Instance-centric Probability Redistribution (IPR) to ensure precise localization and faithful editing. Additionally, we present our new MIVE Dataset featuring diverse video scenarios and introduce the Cross-Instance Accuracy (CIA) Score to evaluate editing leakage in multi-instance video editing tasks. Our extensive qualitative, quantitative, and user study evaluations demonstrate that MIVE significantly outperforms recent state-of-the-art methods in terms of editing faithfulness, accuracy, and leakage prevention, setting a new benchmark for multi-instance video editing. The project page is available at https://kaist-viclab.github.io/mive-site/

Community

Paper author Paper submitter
โ€ข
edited Dec 18, 2024

๐Ÿ”ฅ โ‡ถ MIVE: New Design and Benchmark for Multi-Instance Video Editing ๐Ÿ”ฅ

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ… Novel zero-shot multi-instance video editing
โœ… Novel Disentangled Multi-instance Sampling (DMS) to prevent editing leakage
โœ… Novel Instance-centric Probability Redistribution (IPR) to ensure precise localization and faithful editing
โœ… Constructing a new MIVE Dataset for multi-instance video editing tasks
โœ… Achieving SOTA performances, in terms of editing faithfulness, accuracy, and leakage prevention

๐Ÿช„ Project Page: https://kaist-viclab.github.io/mive-site/
๐Ÿ“„ Paper: https://arxiv.org/abs/2412.12877

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.12877 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.12877 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.12877 in a Space README.md to link it from this page.

Collections including this paper 1