earthflow
/

DOFA

Model card Files Files and versions Community

DOFA / README.md

xShadow

Update README.md

32bed3d verified 5 months ago

preview code

raw

history blame

5.25 kB

	---
	license: cc-by-4.0
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: [More Information Needed]
	- Funded by [optional]: [More Information Needed]
	- Shared by [optional]: [More Information Needed]
	- Model type: [More Information Needed]
	- Language(s) (NLP): [More Information Needed]
	- License: [More Information Needed]
	- Finetuned from model [optional]: [More Information Needed]

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: [More Information Needed]
	- Paper [optional]: [More Information Needed]
	- Demo [optional]: [More Information Needed]

	---
	Table 1: Linear probing results on six classification tasks. All models are trained
	for 50 epochs. The reported numbers are top-1 overall accuracy (OA). Missing values
	are due to the inability of the model to adapt to this domain.

	\| Method \| Backbone \| m-bigearthnet \| m-forestnet \| m-brick-kiln \| m-pv4ger \| m-so2sat \| m-eurosat \|
	\|--------------------\|-------------\|---------------\|-------------\|--------------\|----------\|----------\|-----------\|
	\| Fully Trained \| ViT-S \| 66.0 \| 53.8 \| 98.1 \| 97.6 \| 57.5 \| 97.3 \|
	\| Fully Trained \| SwinV2-T \| 70.0 \| 58.0 \| 98.7 \| 98.0 \| 56.1 \| 97.4 \|
	\| Fully Trained \| ConvNext-B \| 69.1 \| 56.8 \| 98.9 \| 98.0 \| 58.1 \| 97.7 \|
	\| rand. init. \| ViT-B \| 52.9 \| 41.5 \| 84.5 \| 91.3 \| 38.3 \| 85.7 \|
	\| MAE_Single [44]\| ViT-B \| 63.6 \| - \| 88.9 \| 92.2 \| 50.0 \| 88.9 \|
	\| OFA-Net [43] \| ViT-B \| 65.0 \| - \| 94.7 \| 93.2 \| 49.4 \| 91.9 \|
	\| SatMAE [25] \| ViT-B \| 62.1 \| - \| 93.9 \| - \| 46.9 \| 86.4 \|
	\| Scale-MAE [22] \| ViT-L \| - \| - \| - \| 96.9 \| - \| - \|
	\| GFM [21] \| Swin-B \| - \| - \| - \| 96.8 \| - \| - \|
	\| Cross-Scale MAE [23] \| ViT-B \| - \| - \| - \| 93.1 \| - \| - \|
	\| FG-MAE [24] \| ViT-B \| 63.0 \| - \| 94.7 \| - \| 51.4 \| 87.0 \|
	\| CROMA [27] \| ViT-B \| 67.4 \| - \| 91.0 \| - \| 49.2 \| 90.1 \|
	\| DOFA \| ViT-B \| 65.7 \| 50.9 \| 95.8 \| 96.9 \| 55.1 \| 93.9 \|
	\| DOFA \| ViT-L \| 67.5 \| 54.6 \| 96.9 \| 97.3 \| 60.1 \| 97.1 \|



	Table 2: Partial fine-tuning results on six segmentation tasks. All models are
	trained with a frozen backbone for 20 epochs. Reported numbers are mean intersection
	over union (mIoU). Missing values are due to the inability of the model to adapt to
	this domain.

	\| Method \| Backbone \| m-pv4ger-seg \| m-nz-cattle \| m-NeonTree \| m-cashew-plant \| m-SA-crop \| m-chesapeake \|
	\|--------------------\|-------------\|--------------\|-------------\|------------\|----------------\|-----------\|--------------\|
	\| DeepLabv3 \| ResNet101 \| 93.4 \| 67.6 \| 53.9 \| 48.6 \| 30.4 \| 62.1 \|
	\| U-Net \| ResNet101 \| 94.1 \| 80.5 \| 56.6 \| 46.6 \| 29.9 \| 70.8 \|
	\| rand. init. \| ViT-B \| 81.7 \| 74.1 \| 51.7 \| 32.4 \| 29.0 \| 47.1 \|
	\| MAE_Single [44]\| ViT-B \| 88.4 \| 76.4 \| 53.0 \| 40.7 \| 30.7 \| 51.9 \|
	\| OFA-Net [43] \| ViT-B \| 89.4 \| 77.6 \| 53.3 \| 47.9 \| 31.9 \| 54.5 \|
	\| Scale-MAE [22] \| ViT-L \| 83.5 \| 76.5 \| 51.0 \| - \| - \| 61.0 \|
	\| GFM [21] \| Swin-B \| 92.0 \| 75.0 \| 51.1 \| - \| - \| 63.8 \|
	\| Cross-Scale MAE [23] \| ViT-B \| 83.2 \| 77.9 \| 52.1 \| - \| - \| 52.3 \|
	\| CROMA [27] \| ViT-B \| - \| - \| - \| 30.1 \| 31.4 \| - \|
	\| FG-MAE [24] \| ViT-B \| - \| - \| - \| 40.8 \| 30.6 \| - \|
	\| DOFA \| ViT-B \| 94.5 \| 81.4 \| 58.8 \| 51.5 \| 33.0 \| 65.3 \|
	\| DOFA \| ViT-L \| 95.0 \| 81.8 \| 59.4 \| 56.9 \| 32.1 \| 66.3 \|

	---

	## Uses

	Please refer to the Github repo [DOFA](https://github.com/zhu-xlab/DOFA) for more details.

	---
	license: cc-by-4.0
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: [More Information Needed]
	- Funded by [optional]: [More Information Needed]
	- Shared by [optional]: [More Information Needed]
	- Model type: [More Information Needed]
	- Language(s) (NLP): [More Information Needed]
	- License: [More Information Needed]
	- Finetuned from model [optional]: [More Information Needed]

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: [More Information Needed]
	- Paper [optional]: [More Information Needed]
	- Demo [optional]: [More Information Needed]

	---
	Table 1: Linear probing results on six classification tasks. All models are trained
	for 50 epochs. The reported numbers are top-1 overall accuracy (OA). Missing values
	are due to the inability of the model to adapt to this domain.

	\| Method \| Backbone \| m-bigearthnet \| m-forestnet \| m-brick-kiln \| m-pv4ger \| m-so2sat \| m-eurosat \|
	\|--------------------\|-------------\|---------------\|-------------\|--------------\|----------\|----------\|-----------\|
	\| Fully Trained \| ViT-S \| 66.0 \| 53.8 \| 98.1 \| 97.6 \| 57.5 \| 97.3 \|
	\| Fully Trained \| SwinV2-T \| 70.0 \| 58.0 \| 98.7 \| 98.0 \| 56.1 \| 97.4 \|
	\| Fully Trained \| ConvNext-B \| 69.1 \| 56.8 \| 98.9 \| 98.0 \| 58.1 \| 97.7 \|
	\| rand. init. \| ViT-B \| 52.9 \| 41.5 \| 84.5 \| 91.3 \| 38.3 \| 85.7 \|
	\| MAE_Single [44]\| ViT-B \| 63.6 \| - \| 88.9 \| 92.2 \| 50.0 \| 88.9 \|
	\| OFA-Net [43] \| ViT-B \| 65.0 \| - \| 94.7 \| 93.2 \| 49.4 \| 91.9 \|
	\| SatMAE [25] \| ViT-B \| 62.1 \| - \| 93.9 \| - \| 46.9 \| 86.4 \|
	\| Scale-MAE [22] \| ViT-L \| - \| - \| - \| 96.9 \| - \| - \|
	\| GFM [21] \| Swin-B \| - \| - \| - \| 96.8 \| - \| - \|
	\| Cross-Scale MAE [23] \| ViT-B \| - \| - \| - \| 93.1 \| - \| - \|
	\| FG-MAE [24] \| ViT-B \| 63.0 \| - \| 94.7 \| - \| 51.4 \| 87.0 \|
	\| CROMA [27] \| ViT-B \| 67.4 \| - \| 91.0 \| - \| 49.2 \| 90.1 \|
	\| DOFA \| ViT-B \| 65.7 \| 50.9 \| 95.8 \| 96.9 \| 55.1 \| 93.9 \|
	\| DOFA \| ViT-L \| 67.5 \| 54.6 \| 96.9 \| 97.3 \| 60.1 \| 97.1 \|



	Table 2: Partial fine-tuning results on six segmentation tasks. All models are
	trained with a frozen backbone for 20 epochs. Reported numbers are mean intersection
	over union (mIoU). Missing values are due to the inability of the model to adapt to
	this domain.

	\| Method \| Backbone \| m-pv4ger-seg \| m-nz-cattle \| m-NeonTree \| m-cashew-plant \| m-SA-crop \| m-chesapeake \|
	\|--------------------\|-------------\|--------------\|-------------\|------------\|----------------\|-----------\|--------------\|
	\| DeepLabv3 \| ResNet101 \| 93.4 \| 67.6 \| 53.9 \| 48.6 \| 30.4 \| 62.1 \|
	\| U-Net \| ResNet101 \| 94.1 \| 80.5 \| 56.6 \| 46.6 \| 29.9 \| 70.8 \|
	\| rand. init. \| ViT-B \| 81.7 \| 74.1 \| 51.7 \| 32.4 \| 29.0 \| 47.1 \|
	\| MAE_Single [44]\| ViT-B \| 88.4 \| 76.4 \| 53.0 \| 40.7 \| 30.7 \| 51.9 \|
	\| OFA-Net [43] \| ViT-B \| 89.4 \| 77.6 \| 53.3 \| 47.9 \| 31.9 \| 54.5 \|
	\| Scale-MAE [22] \| ViT-L \| 83.5 \| 76.5 \| 51.0 \| - \| - \| 61.0 \|
	\| GFM [21] \| Swin-B \| 92.0 \| 75.0 \| 51.1 \| - \| - \| 63.8 \|
	\| Cross-Scale MAE [23] \| ViT-B \| 83.2 \| 77.9 \| 52.1 \| - \| - \| 52.3 \|
	\| CROMA [27] \| ViT-B \| - \| - \| - \| 30.1 \| 31.4 \| - \|
	\| FG-MAE [24] \| ViT-B \| - \| - \| - \| 40.8 \| 30.6 \| - \|
	\| DOFA \| ViT-B \| 94.5 \| 81.4 \| 58.8 \| 51.5 \| 33.0 \| 65.3 \|
	\| DOFA \| ViT-L \| 95.0 \| 81.8 \| 59.4 \| 56.9 \| 32.1 \| 66.3 \|

	---

	## Uses

	Please refer to the Github repo [DOFA](https://github.com/zhu-xlab/DOFA) for more details.