CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation

Official PyTorch Implementation

This is a PyTorch/GPU implementation of the paper CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation

@article{ahmed2025cam,
  title={CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation},
  author={Ahmed, Masud and Hasan, Zahid and Haque, Syed Arefinul and Faridee, Abu Zaher Md and Purushotham, Sanjay and You, Suya and Roy, Nirmalya},
  journal={arXiv preprint arXiv:2503.15617},
  year={2025}
}

GitHub Repo: https://github.com/mahmed10/CAMSS

Abstract

Traditional transformer-based semantic segmentation relies on quantized embeddings. However, our analysis reveals that autoencoder accuracy on segmentation mask using quantized embeddings (e.g. VQ-VAE) is 8% lower than continuous-valued embeddings (e.g. KL-VAE). Motivated by this, we propose a continuous-valued embedding framework for semantic segmentation. By reformulating semantic mask generation as a continuous image-to-embedding diffusion process, our approach eliminates the need for discrete latent representations while preserving fine-grained spatial and semantic details. Our key contribution includes a diffusion-guided autoregressive transformer that learns a continuous semantic embedding space by modeling long-range dependencies in image features. Our framework contains a unified architecture combining a VAE encoder for continuous feature extraction, a diffusion-guided transformer for conditioned embedding generation, and a VAE decoder for semantic mask reconstruction. Our setting facilitates zero-shot domain adaptation capabilities enabled by the continuity of the embedding space. Experiments across diverse datasets (e.g., Cityscapes and domain-shifted variants) demonstrate state-of-the-art robustness to distribution shifts, including adverse weather (e.g., fog, snow) and viewpoint variations. Our model also exhibits strong noise resilience, achieving robust performance ($\approx$ 95% AP compared to baseline) under gaussian noise, moderate motion blur, and moderate brightness/contrast variations, while experiencing only a moderate impact ($\approx$ 90% AP compared to baseline) from 50% salt and pepper noise, saturation and hue shifts.

Result

Trained on Cityscape dataset and tested on SemanticKITTI, ACDC, CADEdgeTune dataset

Quantitative results of semantic segmentation under various noise conditions


Salt & Pepper Noise

Motion Blur

Gaussian Noise

Gaussian Blur

Brightness Variation

Contrast Variation

Saturation Variation

Hue Variation

Prerequisite

To install the docker environment, first edit the docker_env/Makefile:

IMAGE=img_name/dl-aio
CONTAINER=containter_name
AVAILABLE_GPUS='0,1,2,3'
LOCAL_JUPYTER_PORT=18888
LOCAL_TENSORBOARD_PORT=18006
PASSWORD=yourpassword
WORKSPACE=workspace_directory
  • Edit the img_name, containter_name, available_gpus, jupyter_port, tensorboard_port, password, workspace_directory
  1. For the first time run the following commands in terminal:
cd docker_env
make docker-build
make docker-run
  1. or further use to docker environment
  • To stop the environmnet: make docker-stop
  • To resume the environmente: make docker-resume

For coding open a web browser ip_address:jupyter_port e.g.,http://localhost:18888

Dataset

Four Dataset is used in the work

  1. Cityscapes Dataset
  2. KITTI Dataset
  3. ACDC Dataset
  4. CAD-EdgeTune Dataset

Modify the trainlist and vallist file to edit train and test split

Dataset structure

  • Cityscapes Dataset
|-CityScapes
|----leftImg8bit 
|--------train
|------------aachen #contians the RGB images
|------------bochum #contians the RGB images
|................
|------------zurich #contians the RGB images
|--------val
|................
|----gtFine 
|--------train
|------------aachen #contians the RGB images #contains semantic segmentation labels
|------------bochum #contians the RGB images #contains semantic segmentation labels
|................
|------------zurich #contians the RGB images #contains semantic segmentation labels
|--------val
|................
|----trainlist.txt #image list used for training
|----vallist.txt #image list used for testing
|----cityscape.yaml #configuration file for CityScapes dataset
  • ACDC Dataset
|-ACDC
|----rgb_anon 
|--------fog
|------------train
|----------------GOPR0475 #contians the RGB images
|----------------GOPR0476 #contians the RGB images
|................
|----------------GP020478 #contians the RGB images
|------------val
|................
|--------rain
|................
|--------snow
|................
|----gt 
|--------fog
|------------train
|----------------GOPR0475 #contains semantic segmentation labels
|----------------GOPR0476 #contains semantic segmentation labels
|................
|----------------GP020478 #contains semantic segmentation labels
|------------val
|................
|--------rain
|................
|--------snow
|................
|----vallist_fog.txt #image list used for testing fog data
|----vallist_rain.txt #image list used for testing rain data
|----vallist_snow.txt #image list used for testing snow data
|----acdc.yaml #configuration file for ACDC dataset
  • SemanticKitti Dataset
|-SemanticKitti
|----training 
|--------image_02
|------------0000 #contians the RGB images
|------------0001 #contians the RGB images
|................
|------------0020 #contians the RGB images
|----kitti-step
|--------panoptic_maps
|------------train
|----------------0000 #contains semantic segmentation labels
|----------------0001 #contains semantic segmentation labels
|................
|----------------0020 #contains semantic segmentation labels
|------------val
|................
|----trainlist.txt #image list used for training
|----vallist.txt #image list used for testing
|----semantickitti.yaml #configuration file for SemanticKitti dataset
  • CADEdgeTune Dataset
|-CADEdgeTune
|----SEQ1
|--------Images #contians the RGB images
|--------LabelMasks #contains semantic segmentation labels
|----SEQ2
|--------Images #contians the RGB images
|--------LabelMasks #contains semantic segmentation labels
|................
|----SEQ17
|----all.txt #image list complete
|----trainlist.txt #image list used for training
|----vallist.txt #image list used for testing
|----cadedgetune.yaml #configuration file for CADEdgeTune dataset

Weights

To download the pretrained weights please visit Hugging Face Repo

  • LDM model Pretrained model from Rombach et al.'s Latent Diffusion Models is used Link
  • MAR model Following mar model is used
Training Data Model Params Link
Cityscapes Mar-base 217M link

Download this weight files and organize as follow

|-pretrained_models
|----mar
|--------city768.16.pth
|----vae
|--------modelf16.ckpt

Alternative code to automatically download pretrain weights

import os
import requests

# Define URLs and file paths
files_to_download = {
    "https://huggingface.co/mahmed10/CAM-Seg/resolve/main/pretrained_models/vae/modelf16.ckpt":
        "pretrained_models/vae/modelf16.ckpt",
    "https://huggingface.co/mahmed10/CAM-Seg/resolve/main/pretrained_models/mar/city768.16.pth":
        "pretrained_models/mar/city768.16.pth"
}

for url, path in files_to_download.items():
    os.makedirs(os.path.dirname(path), exist_ok=True)

    print(f"Downloading from {url}...")
    response = requests.get(url, stream=True)
    if response.status_code == 200:
        with open(path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
        print(f"Saved to {path}")
    else:
        print(f"Failed to download from {url}, status code {response.status_code}")

Validation

Open the validation.ipnyb file

Edit the Block 6 to select which dataset is to use for validation

dataset_train = cityscapes.CityScapes('dataset/CityScapes/vallist.txt', data_set= 'val', transform=transform_train,seed=36, img_size=768)
# dataset_train = umbc.UMBC('dataset/UMBC/all.txt', data_set= 'val', transform=transform_train,seed=36, img_size=768)
# dataset_train = acdc.ACDC('dataset/ACDC/vallist_fog.txt', data_set= 'val', transform=transform_train,seed=36, img_size=768)
# dataset_train = semantickitti.SemanticKITTI('dataset/SemanticKitti/vallist.txt', data_set= 'val', transform=transform_train, seed=36, img_size=768)

Run all the blocks

Training

From Scratch

Run the following code in terminal

torchrun --nproc_per_node=4 train.py

it will save checkpoint in output_dir/year.month.day.hour.min folder, for e.g. output_dir/2025.05.09.02.27

Resume Training

Run the following code in terminal

torchrun --nproc_per_node=4 train.py --resume year.month.day.hour.min

Here is an example code

torchrun --nproc_per_node=4 train.py --resume 2025.05.09.02.27

Acknowlegement

The code is developed on top following codework

  1. latent-diffusion
  2. mar
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support