--- license: apache-2.0 tags: - vision inference: false --- # SegGPT model The SegGPT model was proposed in [SegGPT: Segmenting Everything In Context](https://arxiv.org/abs/2304.03284) by Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang. ## Model description SegGPT employs a decoder-only (GPT-like) Transformer that can generate a segmentation mask given an input image, a prompt image and its corresponding prompt mask. The model achieves remarkable one-shot results with 56.1 mIoU on COCO-20 and 85.6 mIoU on FSS-1000. ## Intended uses & limitations You can use the raw model for one-shot image segmentation. ### How to use Here's how to use the model for one-shot semantic segmentation: ```python import torch from datasets import load_dataset from transformers import SegGptImageProcessor, SegGptForImageSegmentation model_id = "BAAI/seggpt-vit-large" image_processor = SegGptImageProcessor.from_pretrained(checkpoint) model = SegGptForImageSegmentation.from_pretrained(checkpoint) dataset_id = "EduardoPacheco/FoodSeg103" ds = load_dataset(dataset_id, split="train") # Number of labels in FoodSeg103 (not including background) num_labels = 103 image_input = ds[4]["image"] ground_truth = ds[4]["label"] image_prompt = ds[29]["image"] mask_prompt = ds[29]["label"] inputs = image_processor( images=image_input, prompt_images=image_prompt, prompt_masks=mask_prompt, num_labels=num_labels, return_tensors="pt" ) with torch.no_grad(): outputs = model(**inputs) target_sizes = [image_input.size[::-1]] mask = image_processor.post_process_semantic_segmentation(outputs, target_sizes, num_labels=num_labels)[0] ``` ### BibTeX entry and citation info ```bibtex @misc{wang2023seggpt, title={SegGPT: Segmenting Everything In Context}, author={Xinlong Wang and Xiaosong Zhang and Yue Cao and Wen Wang and Chunhua Shen and Tiejun Huang}, year={2023}, eprint={2304.03284}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```