Seg-Zero-7B

This model is based on the paper Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement. It uses a decoupled architecture with a reasoning model and a segmentation model. It's trained via reinforcement learning using GRPO without explicit reasoning data, leading to robust zero-shot generalization and emergent test-time reasoning.

Code: https://github.com/dvlab-research/Seg-Zero

Description

This is a Seg-Zero-7B model. It introduces a decoupled architecture consisting of a reasoning model and a segmentation model. The reasoning model interprets user intentions, generates explicit reasoning chains, and produces positional prompts, which are subsequently used by the segmentation model to generate pixel-level masks.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# load model
model = AutoModelForCausalLM.from_pretrained("Ricky06662/Seg-Zero-7B")
tokenizer = AutoTokenizer.from_pretrained("Ricky06662/Seg-Zero-7B")

Installation

git clone https://github.com/dvlab-research/Seg-Zero.git
cd Seg-Zero
conda create -n seg_zero python=3.11
conda activate seg_zero
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1
pip install -e .
pip install sam2
pip install matplotlib

Inference

python inference_scripts/infer.py

The default question is:

"the unusual object in the image."

You will get the thinking process in the command line and the mask will be saved in the inference_scripts folder. You can also provide your own image_path and text:

python inference_scripts/infer.py --image_path "your_image_path" --text "your question text"