DOFA / README.md
xShadow's picture
Update README.md
32bed3d verified
|
raw
history blame
5.25 kB
---
license: cc-by-4.0
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
---
Table 1: Linear probing results on six classification tasks. All models are trained
for 50 epochs. The reported numbers are top-1 overall accuracy (OA). Missing values
are due to the inability of the model to adapt to this domain.
| Method | Backbone | m-bigearthnet | m-forestnet | m-brick-kiln | m-pv4ger | m-so2sat | m-eurosat |
|--------------------|-------------|---------------|-------------|--------------|----------|----------|-----------|
| **Fully Trained** | ViT-S | 66.0 | 53.8 | 98.1 | 97.6 | 57.5 | 97.3 |
| **Fully Trained** | SwinV2-T | 70.0 | 58.0 | 98.7 | 98.0 | 56.1 | 97.4 |
| **Fully Trained** | ConvNext-B | 69.1 | 56.8 | 98.9 | 98.0 | 58.1 | 97.7 |
| **rand. init.** | ViT-B | 52.9 | 41.5 | 84.5 | 91.3 | 38.3 | 85.7 |
| **MAE_Single [44]**| ViT-B | 63.6 | - | 88.9 | 92.2 | 50.0 | 88.9 |
| **OFA-Net [43]** | ViT-B | 65.0 | - | 94.7 | 93.2 | 49.4 | 91.9 |
| **SatMAE [25]** | ViT-B | 62.1 | - | 93.9 | - | 46.9 | 86.4 |
| **Scale-MAE [22]** | ViT-L | - | - | - | 96.9 | - | - |
| **GFM [21]** | Swin-B | - | - | - | 96.8 | - | - |
| **Cross-Scale MAE [23]** | ViT-B | - | - | - | 93.1 | - | - |
| **FG-MAE [24]** | ViT-B | 63.0 | - | 94.7 | - | 51.4 | 87.0 |
| **CROMA [27]** | ViT-B | 67.4 | - | 91.0 | - | 49.2 | 90.1 |
| **DOFA** | ViT-B | 65.7 | 50.9 | 95.8 | 96.9 | 55.1 | 93.9 |
| **DOFA** | ViT-L | **67.5** | **54.6** | **96.9** | **97.3** | **60.1** | **97.1** |
Table 2: Partial fine-tuning results on six segmentation tasks. All models are
trained with a frozen backbone for 20 epochs. Reported numbers are mean intersection
over union (mIoU). Missing values are due to the inability of the model to adapt to
this domain.
| Method | Backbone | m-pv4ger-seg | m-nz-cattle | m-NeonTree | m-cashew-plant | m-SA-crop | m-chesapeake |
|--------------------|-------------|--------------|-------------|------------|----------------|-----------|--------------|
| **DeepLabv3** | ResNet101 | 93.4 | 67.6 | 53.9 | 48.6 | 30.4 | 62.1 |
| **U-Net** | ResNet101 | 94.1 | 80.5 | 56.6 | 46.6 | 29.9 | 70.8 |
| **rand. init.** | ViT-B | 81.7 | 74.1 | 51.7 | 32.4 | 29.0 | 47.1 |
| **MAE_Single [44]**| ViT-B | 88.4 | 76.4 | 53.0 | 40.7 | 30.7 | 51.9 |
| **OFA-Net [43]** | ViT-B | 89.4 | 77.6 | 53.3 | 47.9 | 31.9 | 54.5 |
| **Scale-MAE [22]** | ViT-L | 83.5 | 76.5 | 51.0 | - | - | 61.0 |
| **GFM [21]** | Swin-B | 92.0 | 75.0 | 51.1 | - | - | 63.8 |
| **Cross-Scale MAE [23]** | ViT-B | 83.2 | 77.9 | 52.1 | - | - | 52.3 |
| **CROMA [27]** | ViT-B | - | - | - | 30.1 | 31.4 | - |
| **FG-MAE [24]** | ViT-B | - | - | - | 40.8 | 30.6 | - |
| **DOFA** | ViT-B | 94.5 | 81.4 | 58.8 | 51.5 | **33.0** | 65.3 |
| **DOFA** | ViT-L | **95.0** | **81.8** | **59.4** | **56.9** | **32.1** | **66.3** |
---
## Uses
Please refer to the Github repo [DOFA](https://github.com/zhu-xlab/DOFA) for more details.