File size: 4,781 Bytes
c4ed882
 
 
b45407b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78a4a39
b45407b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
library_name: keras-hub
---
### Model Overview
A Keras model implementing the MixTransformer architecture to be used as a backbone for the SegFormer architecture. This model is supported in both KerasCV and KerasHub. KerasCV will no longer be actively developed, so please try to use KerasHub.

References:
- [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) # noqa: E501
- [Based on the TensorFlow implementation from DeepVision](https://github.com/DavidLandup0/deepvision/tree/main/deepvision/models/classification/mix_transformer) # noqa: E501

## Links
* [MiT Quickstart Notebook: coming soon]()
* [MiT API Documentation: coming soon]()

## Installation

Keras and KerasHub can be installed with:

```
pip install -U -q keras-Hub
pip install -U -q keras>=3
```

Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.

## Presets

The following model checkpoints are provided by the Keras team. Weights have been ported from https://dl.fbaipublicfiles.com/segment_anything/. Full code examples for each are available below.
Here's the table formatted similarly to the given pattern:

Here's the updated table with the input resolutions included in the descriptions:

| Preset name              | Parameters | Description                                                                                      |
|--------------------------|------------|--------------------------------------------------------------------------------------------------|
| mit_b0_ade20k_512        | 3.32M      | MiT (MixTransformer) model with 8 transformer blocks, trained on the ADE20K dataset with an input resolution of 512x512 pixels. |
| mit_b1_ade20k_512        | 13.16M     | MiT (MixTransformer) model with 8 transformer blocks, trained on the ADE20K dataset with an input resolution of 512x512 pixels. |
| mit_b2_ade20k_512        | 24.20M     | MiT (MixTransformer) model with 16 transformer blocks, trained on the ADE20K dataset with an input resolution of 512x512 pixels. |
| mit_b3_ade20k_512        | 44.08M     | MiT (MixTransformer) model with 28 transformer blocks, trained on the ADE20K dataset with an input resolution of 512x512 pixels. |
| mit_b4_ade20k_512        | 60.85M     | MiT (MixTransformer) model with 41 transformer blocks, trained on the ADE20K dataset with an input resolution of 512x512 pixels. |
| mit_b5_ade20k_640        | 81.45M     | MiT (MixTransformer) model with 52 transformer blocks, trained on the ADE20K dataset with an input resolution of 640x640 pixels. |
| mit_b0_cityscapes_1024   | 3.32M      | MiT (MixTransformer) model with 8 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. |
| mit_b1_cityscapes_1024   | 13.16M     | MiT (MixTransformer) model with 8 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. |
| mit_b2_cityscapes_1024   | 24.20M     | MiT (MixTransformer) model with 16 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. |
| mit_b3_cityscapes_1024   | 44.08M     | MiT (MixTransformer) model with 28 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. |
| mit_b4_cityscapes_1024   | 60.85M     | MiT (MixTransformer) model with 41 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. |
| mit_b5_cityscapes_1024   | 81.45M     | MiT (MixTransformer) model with 52 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. |

### Example Usage
Using the class with a `backbone`:

```
import tensorflow as tf
import keras_cv
import numpy as np

images = np.ones(shape=(1, 96, 96, 3))
labels = np.zeros(shape=(1, 96, 96, 1))
backbone = keras_cv.models.MiTBackbone.from_preset("mit_b3_ade20k_512")

# Evaluate model
model(images)

# Train model
model.compile(
     optimizer="adam",
     loss=keras.losses.BinaryCrossentropy(from_logits=False),
     metrics=["accuracy"],
)
model.fit(images, labels, epochs=3)
```

## Example Usage with Hugging Face URI

Using the class with a `backbone`:

```
import tensorflow as tf
import keras_cv
import numpy as np

images = np.ones(shape=(1, 96, 96, 3))
labels = np.zeros(shape=(1, 96, 96, 1))
backbone = keras_cv.models.MiTBackbone.from_preset("hf://keras/mit_b3_ade20k_512")

# Evaluate model
model(images)

# Train model
model.compile(
     optimizer="adam",
     loss=keras.losses.BinaryCrossentropy(from_logits=False),
     metrics=["accuracy"],
)
model.fit(images, labels, epochs=3)
```