Model Card for Point Transformer V3 Lane Detection

This model performs semantic segmentation (lane line) on LiDAR point cloud data to detect and segment lane markings for autonomous vehicle navigation.

Model Details

Model Description

Point Transformer V3 model adapted for lane detection from LiDAR point clouds, featuring hierarchical encoder-decoder architecture with self-attention mechanisms for point cloud processing.

Developed by: Bryan Chang
Model type: Point Transformer V3 (PT-v3m1)
License: MIT
Finetuned from model: Nuscence-pretrained model

Model Sources

Repository: https://github.com/Bryan1203/LiDAR-Based-Lane-Navigation
Demo: https://www.youtube.com/watch?v=cCTi2zFftlY

Uses

Direct Use

The model can be directly used for:

Lane detection from LiDAR point cloud data (ouster lidar with signal attribute)
Semantic segmentation of road surfaces
Real-time autonomous navigation systems

Downstream Use

Can be integrated into:

Autonomous vehicle navigation systems
Road infrastructure mapping
Traffic monitoring systems
Path planning algorithms

Out-of-Scope Use

This model should not be used for:

Non-LiDAR point cloud data
Indoor navigation
Object detection tasks
High-speed autonomous driving without additional safety systems

Bias, Risks, and Limitations

Performance may degrade in adverse weather conditions
Requires high-quality LiDAR data
Limited to ground-level lane markings
May struggle with unusual road geometries
Real-time performance depends on hardware capabilities

Recommendations

Users should:

Validate model performance in their specific deployment environment
Implement appropriate safety fallbacks
Consider sensor fusion for robust operation
Monitor inference time for real-time applications
Regularly evaluate model performance on new data

How to Get Started with the Model

refer to the repo, src/pointcept151/inference_ros_filter.py for implementation

Training Details

Training Data

Based on SemanticKITTI dataset format
Binary classification: background (0) and lane (1)
Point cloud data with 4 channels: x, y, z, intensity (signal)

Training Procedure

Preprocessing

Grid sampling with size 0.05
Random rotation, scaling, and flipping augmentations
Random jittering (σ=0.005, clip=0.02)

Training Hyperparameters

Training regime: Mixed precision (fp16)
Batch size: 4
Epochs: 50
Optimizer: AdamW (lr=0.004, weight_decay=0.005)
Scheduler: OneCycleLR
Loss functions: CrossEntropy + Lovasz Loss

Speeds, Sizes, Times

Inference time: 300-400ms per frame on RTX A4000
Model size: ~500MB
Training time: ~24 hours on single GPU

Evaluation

Testing Data, Factors & Metrics

Testing Data

Custom labeled high-bay dataset (UIUC testing facility)
Test split from training data

Factors

Time of day
Weather conditions
Road surface types
Lane marking visibility

Metrics

Mean IoU
Per-class accuracy
Inference time
Memory usage

Results

Performance metrics on test set:

Mean IoU: [Pending final evaluation]
Background accuracy: [Pending final evaluation]
Lane accuracy: [Pending final evaluation]

Environmental Impact

Hardware Type: NVIDIA RTX A4000
Hours used: ~24 for training
Cloud Provider: Local computation
Carbon Emitted: [To be calculated]

Technical Specifications

Model Architecture and Objective

Detailed in configuration:

Encoder depths: (2, 2, 2, 6, 2)
Encoder channels: (32, 64, 128, 256, 512)
Decoder depths: (2, 2, 2, 2)
MLP ratio: 4
Attention heads: Varies by layer

Compute Infrastructure

Hardware

NVIDIA RTX A4000 (16GB VRAM)
32GB RAM minimum
Multi-core CPU

Software

Python 3.8+
PyTorch 1.10+
CUDA 11.3+
ROS Noetic
Pointcept framework

Model Card Authors

Bryan Chang

Model Card Contact

[email protected]