Model Card for Point Transformer V3 Lane Detection

This model performs semantic segmentation (lane line) on LiDAR point cloud data to detect and segment lane markings for autonomous vehicle navigation.

Model Details

Model Description

Point Transformer V3 model adapted for lane detection from LiDAR point clouds, featuring hierarchical encoder-decoder architecture with self-attention mechanisms for point cloud processing.

  • Developed by: Bryan Chang
  • Model type: Point Transformer V3 (PT-v3m1)
  • License: MIT
  • Finetuned from model: Nuscence-pretrained model

Model Sources

Uses

Direct Use

The model can be directly used for:

  • Lane detection from LiDAR point cloud data (ouster lidar with signal attribute)
  • Semantic segmentation of road surfaces
  • Real-time autonomous navigation systems

Downstream Use

Can be integrated into:

  • Autonomous vehicle navigation systems
  • Road infrastructure mapping
  • Traffic monitoring systems
  • Path planning algorithms

Out-of-Scope Use

This model should not be used for:

  • Non-LiDAR point cloud data
  • Indoor navigation
  • Object detection tasks
  • High-speed autonomous driving without additional safety systems

Bias, Risks, and Limitations

  • Performance may degrade in adverse weather conditions
  • Requires high-quality LiDAR data
  • Limited to ground-level lane markings
  • May struggle with unusual road geometries
  • Real-time performance depends on hardware capabilities

Recommendations

Users should:

  • Validate model performance in their specific deployment environment
  • Implement appropriate safety fallbacks
  • Consider sensor fusion for robust operation
  • Monitor inference time for real-time applications
  • Regularly evaluate model performance on new data

How to Get Started with the Model

refer to the repo, src/pointcept151/inference_ros_filter.py for implementation

Training Details

Training Data

  • Based on SemanticKITTI dataset format
  • Binary classification: background (0) and lane (1)
  • Point cloud data with 4 channels: x, y, z, intensity (signal)

Training Procedure

Preprocessing

  • Grid sampling with size 0.05
  • Random rotation, scaling, and flipping augmentations
  • Random jittering (σ=0.005, clip=0.02)

Training Hyperparameters

  • Training regime: Mixed precision (fp16)
  • Batch size: 4
  • Epochs: 50
  • Optimizer: AdamW (lr=0.004, weight_decay=0.005)
  • Scheduler: OneCycleLR
  • Loss functions: CrossEntropy + Lovasz Loss

Speeds, Sizes, Times

  • Inference time: 300-400ms per frame on RTX A4000
  • Model size: ~500MB
  • Training time: ~24 hours on single GPU

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • Custom labeled high-bay dataset (UIUC testing facility)
  • Test split from training data

Factors

  • Time of day
  • Weather conditions
  • Road surface types
  • Lane marking visibility

Metrics

  • Mean IoU
  • Per-class accuracy
  • Inference time
  • Memory usage

Results

Performance metrics on test set:

  • Mean IoU: [Pending final evaluation]
  • Background accuracy: [Pending final evaluation]
  • Lane accuracy: [Pending final evaluation]

Environmental Impact

  • Hardware Type: NVIDIA RTX A4000
  • Hours used: ~24 for training
  • Cloud Provider: Local computation
  • Carbon Emitted: [To be calculated]

Technical Specifications

Model Architecture and Objective

Detailed in configuration:

  • Encoder depths: (2, 2, 2, 6, 2)
  • Encoder channels: (32, 64, 128, 256, 512)
  • Decoder depths: (2, 2, 2, 2)
  • MLP ratio: 4
  • Attention heads: Varies by layer

Compute Infrastructure

Hardware

  • NVIDIA RTX A4000 (16GB VRAM)
  • 32GB RAM minimum
  • Multi-core CPU

Software

  • Python 3.8+
  • PyTorch 1.10+
  • CUDA 11.3+
  • ROS Noetic
  • Pointcept framework

Model Card Authors

Bryan Chang

Model Card Contact

[email protected]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .