Spaces:
Running
on
Zero
title: Video Redaction
emoji: 🐨
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.14.0
app_file: app.py
pinned: false
Promptable Video Redaction with Moondream
This tool uses Moondream 2B, a powerful yet lightweight vision-language model, to detect and redact objects from videos. Moondream can recognize a wide variety of objects, people, text, and more with high accuracy while being much smaller than traditional models.
About Moondream
Moondream is a tiny yet powerful vision-language model that can analyze images and answer questions about them. It's designed to be lightweight and efficient while maintaining high accuracy. Some key features:
- Only 2B parameters
- Fast inference with minimal resource requirements
- Supports CPU and GPU execution
- Open source and free to use
- Can detect almost anything you can describe in natural language
Links:
Features
- Real-time object detection in videos using Moondream
- Multiple visualization styles:
- Censor: Black boxes over detected objects
- Bounding Box: Traditional bounding boxes with labels
- Hitmarker: Call of Duty style crosshair markers
- Optional grid-based detection for improved accuracy
- Flexible object type detection using natural language
- Frame-by-frame processing with IoU-based merging
- Batch processing of multiple videos
- Web-compatible output format
- User-friendly web interface
- Command-line interface for automation
Requirements
- Python 3.8+
- OpenCV (cv2)
- PyTorch
- Transformers
- Pillow (PIL)
- tqdm
- ffmpeg
- numpy
- gradio (for web interface)
Installation
- Clone this repository and create a new virtual environment
git clone https://github.com/vikhyat/moondream/blob/main/recipes/promptable-video-redaction
python -m venv .venv
source .venv/bin/activate
- Install the required packages:
pip install -r requirements.txt
- Install ffmpeg:
- On Ubuntu/Debian:
sudo apt-get install ffmpeg libvips
- On macOS:
brew install ffmpeg
- On Windows: Download from ffmpeg.org
Downloading libvips for Windows requires some additional steps, see here
- On Ubuntu/Debian:
Usage
Web Interface
- Start the web interface:
python app.py
Open the provided URL in your browser
Use the interface to:
- Upload your video
- Specify what to censor (e.g., face, logo, text)
- Adjust processing speed and quality
- Configure grid size for detection
- Process and download the censored video
Command Line Interface
- Create an
inputs
directory in the same folder as the script:
mkdir inputs
Place your video files in the
inputs
directory. Supported formats:- .mp4
- .avi
- .mov
- .mkv
- .webm
Run the script:
python main.py
Optional Arguments:
--test
: Process only first 3 seconds of each video (useful for testing detection settings)
python main.py --test
--preset
: Choose FFmpeg encoding preset (affects output quality vs. speed)
python main.py --preset ultrafast # Fastest, lower quality
python main.py --preset veryslow # Slowest, highest quality
--detect
: Specify what object type to detect (using natural language)
python main.py --detect person # Detect people
python main.py --detect "red car" # Detect red cars
python main.py --detect "person wearing a hat" # Detect people with hats
--box-style
: Choose visualization style
python main.py --box-style censor # Black boxes (default)
python main.py --box-style bounding-box # Bounding box-style boxes with labels
python main.py --box-style hitmarker # COD-style hitmarkers
--rows
and--cols
: Enable grid-based detection by splitting frames
python main.py --rows 2 --cols 2 # Split each frame into 2x2 grid
python main.py --rows 3 --cols 3 # Split each frame into 3x3 grid
You can combine arguments:
python main.py --detect "person wearing sunglasses" --box-style bounding-box --test --preset "fast" --rows 2 --cols 2
Visualization Styles
The tool supports three different visualization styles for detected objects:
Censor (default)
- Places solid black rectangles over detected objects
- Best for privacy and content moderation
- Completely obscures the detected region
Bounding Box
- Traditional object detection style
- Red bounding box around detected objects
- Label showing object type above the box
- Good for analysis and debugging
Hitmarker
- Call of Duty inspired visualization
- White crosshair marker at center of detected objects
- Small label above the marker
- Stylistic choice for gaming-inspired visualization
Choose the style that best fits your use case using the --box-style
argument.
Output
Processed videos will be saved in the outputs
directory with the format:
[style]_[object_type]_[original_filename].mp4
For example:
censor_face_video.mp4
bounding-box_person_video.mp4
hitmarker_car_video.mp4
The output videos will include:
- Original video content
- Selected visualization style for detected objects
- Web-compatible H.264 encoding
Notes
- Processing time depends on video length, grid size, and GPU availability
- GPU is strongly recommended for faster processing
- Requires sufficient disk space for temporary files
- Detection quality varies based on video quality and Moondream's ability to recognize the specified object
- Grid-based detection impacts performance significantly - use only when needed
- Web interface shows progress updates and errors
- Choose visualization style based on your use case
- Moondream can detect almost anything you can describe in natural language