3D Computer Vision, Semantic Understanding, SLAM, Multi-modal Interactions, Spatiotemporal Reasoning, VLMs/LLMs