Model Card for Point Transformer V3 Lane Detection
This model performs semantic segmentation (lane line) on LiDAR point cloud data to detect and segment lane markings for autonomous vehicle navigation.
Model Details
Model Description
Point Transformer V3 model adapted for lane detection from LiDAR point clouds, featuring hierarchical encoder-decoder architecture with self-attention mechanisms for point cloud processing.
- Developed by: Bryan Chang
- Model type: Point Transformer V3 (PT-v3m1)
- License: MIT
- Finetuned from model: Nuscence-pretrained model
Model Sources
- Repository: https://github.com/Bryan1203/LiDAR-Based-Lane-Navigation
- Demo: https://www.youtube.com/watch?v=cCTi2zFftlY
Uses
Direct Use
The model can be directly used for:
- Lane detection from LiDAR point cloud data (ouster lidar with signal attribute)
- Semantic segmentation of road surfaces
- Real-time autonomous navigation systems
Downstream Use
Can be integrated into:
- Autonomous vehicle navigation systems
- Road infrastructure mapping
- Traffic monitoring systems
- Path planning algorithms
Out-of-Scope Use
This model should not be used for:
- Non-LiDAR point cloud data
- Indoor navigation
- Object detection tasks
- High-speed autonomous driving without additional safety systems
Bias, Risks, and Limitations
- Performance may degrade in adverse weather conditions
- Requires high-quality LiDAR data
- Limited to ground-level lane markings
- May struggle with unusual road geometries
- Real-time performance depends on hardware capabilities
Recommendations
Users should:
- Validate model performance in their specific deployment environment
- Implement appropriate safety fallbacks
- Consider sensor fusion for robust operation
- Monitor inference time for real-time applications
- Regularly evaluate model performance on new data
How to Get Started with the Model
refer to the repo, src/pointcept151/inference_ros_filter.py for implementation
Training Details
Training Data
- Based on SemanticKITTI dataset format
- Binary classification: background (0) and lane (1)
- Point cloud data with 4 channels: x, y, z, intensity (signal)
Training Procedure
Preprocessing
- Grid sampling with size 0.05
- Random rotation, scaling, and flipping augmentations
- Random jittering (σ=0.005, clip=0.02)
Training Hyperparameters
- Training regime: Mixed precision (fp16)
- Batch size: 4
- Epochs: 50
- Optimizer: AdamW (lr=0.004, weight_decay=0.005)
- Scheduler: OneCycleLR
- Loss functions: CrossEntropy + Lovasz Loss
Speeds, Sizes, Times
- Inference time: 300-400ms per frame on RTX A4000
- Model size: ~500MB
- Training time: ~24 hours on single GPU
Evaluation
Testing Data, Factors & Metrics
Testing Data
- Custom labeled high-bay dataset (UIUC testing facility)
- Test split from training data
Factors
- Time of day
- Weather conditions
- Road surface types
- Lane marking visibility
Metrics
- Mean IoU
- Per-class accuracy
- Inference time
- Memory usage
Results
Performance metrics on test set:
- Mean IoU: [Pending final evaluation]
- Background accuracy: [Pending final evaluation]
- Lane accuracy: [Pending final evaluation]
Environmental Impact
- Hardware Type: NVIDIA RTX A4000
- Hours used: ~24 for training
- Cloud Provider: Local computation
- Carbon Emitted: [To be calculated]
Technical Specifications
Model Architecture and Objective
Detailed in configuration:
- Encoder depths: (2, 2, 2, 6, 2)
- Encoder channels: (32, 64, 128, 256, 512)
- Decoder depths: (2, 2, 2, 2)
- MLP ratio: 4
- Attention heads: Varies by layer
Compute Infrastructure
Hardware
- NVIDIA RTX A4000 (16GB VRAM)
- 32GB RAM minimum
- Multi-core CPU
Software
- Python 3.8+
- PyTorch 1.10+
- CUDA 11.3+
- ROS Noetic
- Pointcept framework
Model Card Authors
Bryan Chang