arxiv:2503.16776

OpenCity3D: What do Vision-Language Models know about Urban Environments?

Published on Mar 21

· Submitted by

LUC1O on Mar 26

Authors:

Valentin Bieri ,

Qingxuan Chen ,

Abstract

Vision-language models (VLMs) show great promise for 3D scene understanding but are mainly applied to indoor spaces or autonomous driving, focusing on low-level tasks like segmentation. This work expands their use to urban-scale environments by leveraging 3D reconstructions from multi-view aerial imagery. We propose OpenCity3D, an approach that addresses high-level tasks, such as population density estimation, building age classification, property price prediction, crime rate assessment, and noise pollution evaluation. Our findings highlight OpenCity3D's impressive zero-shot and few-shot capabilities, showcasing adaptability to new contexts. This research establishes a new paradigm for language-driven urban analytics, enabling applications in planning, policy, and environmental monitoring. See our project page: opencity3d.github.io

LUC1O

Paper author Paper submitter 25 days ago

25 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2503.16776 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2503.16776 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2503.16776 in a Space README.md to link it from this page.