video-redaction / README.md
vikhyatk's picture
initial commit
195fd31
metadata
title: Video Redaction
emoji: 🐨
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.14.0
app_file: app.py
pinned: false

Promptable Video Redaction with Moondream

This tool uses Moondream 2B, a powerful yet lightweight vision-language model, to detect and redact objects from videos. Moondream can recognize a wide variety of objects, people, text, and more with high accuracy while being much smaller than traditional models.

Try it now.

About Moondream

Moondream is a tiny yet powerful vision-language model that can analyze images and answer questions about them. It's designed to be lightweight and efficient while maintaining high accuracy. Some key features:

  • Only 2B parameters
  • Fast inference with minimal resource requirements
  • Supports CPU and GPU execution
  • Open source and free to use
  • Can detect almost anything you can describe in natural language

Links:

Features

  • Real-time object detection in videos using Moondream
  • Multiple visualization styles:
    • Censor: Black boxes over detected objects
    • Bounding Box: Traditional bounding boxes with labels
    • Hitmarker: Call of Duty style crosshair markers
  • Optional grid-based detection for improved accuracy
  • Flexible object type detection using natural language
  • Frame-by-frame processing with IoU-based merging
  • Batch processing of multiple videos
  • Web-compatible output format
  • User-friendly web interface
  • Command-line interface for automation

Requirements

  • Python 3.8+
  • OpenCV (cv2)
  • PyTorch
  • Transformers
  • Pillow (PIL)
  • tqdm
  • ffmpeg
  • numpy
  • gradio (for web interface)

Installation

  1. Clone this repository and create a new virtual environment
git clone https://github.com/vikhyat/moondream/blob/main/recipes/promptable-video-redaction
python -m venv .venv
source .venv/bin/activate
  1. Install the required packages:
pip install -r requirements.txt
  1. Install ffmpeg:
    • On Ubuntu/Debian: sudo apt-get install ffmpeg libvips
    • On macOS: brew install ffmpeg
    • On Windows: Download from ffmpeg.org

      Downloading libvips for Windows requires some additional steps, see here

Usage

Web Interface

  1. Start the web interface:
python app.py
  1. Open the provided URL in your browser

  2. Use the interface to:

    • Upload your video
    • Specify what to censor (e.g., face, logo, text)
    • Adjust processing speed and quality
    • Configure grid size for detection
    • Process and download the censored video

Command Line Interface

  1. Create an inputs directory in the same folder as the script:
mkdir inputs
  1. Place your video files in the inputs directory. Supported formats:

    • .mp4
    • .avi
    • .mov
    • .mkv
    • .webm
  2. Run the script:

python main.py

Optional Arguments:

  • --test: Process only first 3 seconds of each video (useful for testing detection settings)
python main.py --test
  • --preset: Choose FFmpeg encoding preset (affects output quality vs. speed)
python main.py --preset ultrafast  # Fastest, lower quality
python main.py --preset veryslow   # Slowest, highest quality
  • --detect: Specify what object type to detect (using natural language)
python main.py --detect person     # Detect people
python main.py --detect "red car"  # Detect red cars
python main.py --detect "person wearing a hat"  # Detect people with hats
  • --box-style: Choose visualization style
python main.py --box-style censor     # Black boxes (default)
python main.py --box-style bounding-box       # Bounding box-style boxes with labels
python main.py --box-style hitmarker  # COD-style hitmarkers
  • --rows and --cols: Enable grid-based detection by splitting frames
python main.py --rows 2 --cols 2   # Split each frame into 2x2 grid
python main.py --rows 3 --cols 3   # Split each frame into 3x3 grid

You can combine arguments:

python main.py --detect "person wearing sunglasses" --box-style bounding-box --test --preset "fast" --rows 2 --cols 2

Visualization Styles

The tool supports three different visualization styles for detected objects:

  1. Censor (default)

    • Places solid black rectangles over detected objects
    • Best for privacy and content moderation
    • Completely obscures the detected region
  2. Bounding Box

    • Traditional object detection style
    • Red bounding box around detected objects
    • Label showing object type above the box
    • Good for analysis and debugging
  3. Hitmarker

    • Call of Duty inspired visualization
    • White crosshair marker at center of detected objects
    • Small label above the marker
    • Stylistic choice for gaming-inspired visualization

Choose the style that best fits your use case using the --box-style argument.

Output

Processed videos will be saved in the outputs directory with the format: [style]_[object_type]_[original_filename].mp4

For example:

  • censor_face_video.mp4
  • bounding-box_person_video.mp4
  • hitmarker_car_video.mp4

The output videos will include:

  • Original video content
  • Selected visualization style for detected objects
  • Web-compatible H.264 encoding

Notes

  • Processing time depends on video length, grid size, and GPU availability
  • GPU is strongly recommended for faster processing
  • Requires sufficient disk space for temporary files
  • Detection quality varies based on video quality and Moondream's ability to recognize the specified object
  • Grid-based detection impacts performance significantly - use only when needed
  • Web interface shows progress updates and errors
  • Choose visualization style based on your use case
  • Moondream can detect almost anything you can describe in natural language