license: mit
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities (ArXiv 2024)
Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu
Official models for AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities
Abstract
We introduce AnySat: a JEPA-based multimodal Earth Observation model that train simultaneously on diverse datasets with different scales, resolutions (spatial, spectral, temporal), and modality combinations.
For more details and results, please check out our github and project page.
Inference π₯
In order to load our pretrained models, you can run:
from models.huggingface import AnySat
## Code to use pretrained weights
model = AnySat(size="base", pretrained=True) #Exists also "small" and "tiny"
To get features from an observation of a batch of observations, you need to provide to the model a dictionnary where keys are from the list:
Dataset | Description | Tensor Size | Channels | Resolution |
---|---|---|---|---|
aerial | Single date tensor | Bx4xHxW | RGB, NiR | 0.2m |
aerial-flair | Single date tensor | Bx5xHxW | RGB, NiR, Elevation | 0.2m |
spot | Single date tensor | Bx3xHxW | RGB | 1m |
naip | Single date tensor | Bx4xHxW | RGB | 1.25m |
s2 | Time series tensor | BxTx10xHxW | B2, B3, B4, B5, B6, B7, B8, B8a, B11, B12 | 10m |
s1-asc | Time series tensor | BxTx2xHxW | VV, VH | 10m |
s1 | Time series tensor | BxTx3xHxW | VV, VH, Ratio | 10m |
alos | Time series tensor | BxTx3xHxW | HH, HV, Ratio | 30m |
l7 | Time series tensor | BxTx6xHxW | B1, B2, B3, B4, B5, B7 | 30m |
l8 | Time series tensor | BxTx11xHxW | B8, B1, B2, B3, B4, B5, B6, B7, B9, B10, B11 | 10m |
modis | Time series tensor | BxTx7xHxW | B1, B2, B3, B4, B5, B6, B7 | 250m |
Time series keys require a "{key}_dates" (for example "s2_dates") tensor of size BxT that value an integer that represent the day of the year. Then you have to choose at which scale you want te produce features. Scale argument is in meters and represent the size of the desired patch size. Outputs will be composed of the concatenation of a class token and a flattened feature map where each feature encodes a scale x scale zone. Scale should divide the spatial cover of all modalities and be a multiple of 10. Then, you can run:
features = AnySat(data, scale=scale) #where scale is the size in meters of patches
And then you can apply those features to the desired downstream task!
If you want to get a feature map at the density of a specific modality you can specify:
features = AnySat(data, scale=scale, keep_subpatch=True, modality_keep=modality) #where modality is the name of the desired modality
Note that the features will be of size 2*D. If you have several modalities of the same desired resolution, you should pick the most informative one (or modify the code to concatenante also the other modalities)
To reproduce results, add new modalities, or do more experiments see the full code on github.
Citing π«