Caleb Spradlin
added text and image
76005a3
intro = '''Remote sensing images from NASA's fleet of Earth-observing satellites are pivotal for applications as broad as land cover mapping,
disaster monitoring, urban planning, and environmental analysis. The potential of AI-based geospatial foundation models for performing
visual analysis tasks on these remote sensing images has garnered significant attention. To realize that potential, the crucial first
step is to develop foundation models – computer models that acquire competence in a broad range of tasks, which can then be specialized
with further training for specific applications. In this case, the foundation model is based on a large-scale vision transformer model
trained with satellite imagery.
Vision transformers employ AI/deep learning techniques to fine-tune the model to answer specific science questions. Through training
on extensive remote sensing datasets, vision transformers can learn general relationships between the spectral data given as inputs,
as well as capture high-level visual patterns, semantics, and spatial relationships that can be leveraged for a wide range of analysis tasks.
Trained vision transformers can handle large-scale, high-resolution data; learn global reorientations; extract robust features; and support
multi-modal data fusion – all with improved performance.
The Data Science Group at NASA Goddard Space Flight Center's Computational and Information Sciences and Technology Office (CISTO)
has implemented an end-to-end workflow to generate a pre-trained vision transformer which could evolve into a foundation model.
A training dataset of over 2 million 128x128 pixel “chips” has been created from NASA’s Moderate Resolution Imaging Spectroradiometer (MODIS)
surface reflectance products (MOD09). These data were used to train a SwinV2 vision transformer that we call SatVision.
'''