MobileNet v2

Use case : `Image classification`

Model description

MobileNet v2 is very similar to the original MobileNet, except that it uses inverted residual blocks with bottlenecking features.

It has a drastically lower parameter count than the original MobileNet.

MobileNet models support any input size greater than 32 x 32, with larger image sizes offering better performance. Alpha parameter: float, larger than zero, controls the width of the network. This is known as the width multiplier in the MobileNetV2 paper, but the name is kept for consistency with applications.

If alpha < 1.0, proportionally decreases the number of filters in each layer.

If alpha > 1.0, proportionally increases the number of filters in each layer.

If alpha = 1.0, default number of filters from the paper are used at each layer.

(source: https://keras.io/api/applications/mobilenet/)

The model is quantized in int8 using tensorflow lite converter.

Network information

Network Information	Value
Framework	TensorFlow Lite
MParams alpha=0.35	1.66 M
Quantization	int8
Provenance	https://www.tensorflow.org/api_docs/python/tf/keras/applications/mobilenet_v2
Paper	https://arxiv.org/pdf/1801.04381.pdf

The models are quantized using tensorflow lite converter.

Network inputs / outputs

For an image resolution of NxM and P classes

Input Shape	Description
(1, N, M, 3)	Single NxM RGB image with UINT8 values between 0 and 255

Output Shape	Description
(1, P)	Per-class confidence for P classes in FLOAT32

Recommended platforms

Platform	Supported	Recommended
STM32L0	[]	[]
STM32L4	[x]	[]
STM32U5	[x]	[]
STM32H7	[x]	[x]
STM32MP1	[x]	[x]
STM32MP2	[x]	[x]
STM32N6	[x]	[x]

Performances

Metricss

Measures are done with default STM32Cube.AI configuration with enabled input / output allocated option.
tfs stands for "training from scratch", meaning that the model weights were randomly initialized before training.
tl stands for "transfer learning", meaning that the model backbone weights were initialized from a pre-trained model, then only the last layer was unfrozen during the training.
fft stands for "full fine-tuning", meaning that the full model weights were initialized from a transfer learning pre-trained model, and all the layers were unfrozen during the training.

Reference NPU memory footprint on food101 and imagenet dataset (see Accuracy for details on dataset)

Model	Dataset	Format	Resolution	Series	Internal RAM (KiB)	Weights Flash (KiB)	STEdgeAI Core version
MobileNet v2 0.35 fft	food101	Int8	128x128x3	STM32N6	240	530.59	3.0.0
MobileNet v2 0.35 fft	food101	Int8/Int4	128x128x3	STM32N6	240	396.44	3.0.0
MobileNet v2 0.35 fft	food101	Int8	224x224x3	STM32N6	931	557.44	3.0.0
MobileNet v2 0.35 fft	food101	Int8/Int4	224x224x3	STM32N6	1127	423.28	3.0.0
MobileNet v2 1.0 fft	food101	Int8	224x224x3	STM32N6	2058	2686.42	3.0.0
MobileNet v2 1.0 fft	food101	Int8/Int4	224x224x3	STM32N6	2058	2336.39	3.0.0
MobileNet v2 0.35 fft	Person	Int8	128x128x3	STM32N6	240	404.55	3.0.0
MobileNet v2 0.35	imagenet	Int8	128x128x3	STM32N6	240	1656.28	3.0.0
MobileNet v2 0.35	imagenet	Int8/Int4	128x128x3	STM32N6	240	962.22	3.0.0
MobileNet v2 0.35	imagenet	Int8	224x224x3	STM32N6	931	1683.13	3.0.0
MobileNet v2 0.35	imagenet	Int8/Int4	224x224x3	STM32N6	1127	989.06	3.0.0
MobileNet v2 1.0	imagenet	Int8	224x224x3	STM32N6	2058	3812.11	3.0.0
MobileNet v2 1.0	imagenet	Int8/Int4	224x224x3	STM32N6	2058	2988.05	3.0.0
MobileNet v2 1.4	imagenet	Int8	224x224x3	STM32N6	2361	6746.7	3.0.0
MobileNet v2 1.4	imagenet	Int8/Int4	224x224x3	STM32N6	2361	5480.25	3.0.0

Reference NPU inference time on food101 and imagenet dataset (see Accuracy for details on dataset)

Model	Dataset	Format	Resolution	Board	Execution Engine	Inference time (ms)	Inf / sec	STEdgeAI Core version
MobileNet v2 0.35 fft	food101	Int8	128x128x3	STM32N6570-DK	NPU/MCU	2.82	354.6	3.0.0
MobileNet v2 0.35 fft	food101	Int8/Int4	128x128x3	STM32N6570-DK	NPU/MCU	2.65	377.36	3.0.0
MobileNet v2 0.35 fft	food101	Int8	224x224x3	STM32N6570-DK	NPU/MCU	5.67	176.36	3.0.0
MobileNet v2 0.35 fft	food101	Int8/Int4	224x224x3	STM32N6570-DK	NPU/MCU	5.43	184.16	3.0.0
MobileNet v2 1.0 fft	food101	Int8	224x224x3	STM32N6570-DK	NPU/MCU	17.44	57.34	3.0.0
MobileNet v2 1.0 fft	food101	Int8/Int4	224x224x3	STM32N6570-DK	NPU/MCU	16.43	60.86	3.0.0
MobileNet v2 0.35 fft	Person	Int8	224x224x3	STM32N6570-DK	NPU/MCU	2.47	404.86	3.0.0
MobileNet v2 0.35	imagenet	Int8	128x128x3	STM32N6570-DK	NPU/MCU	5.83	171.53	3.0.0
MobileNet v2 0.35	imagenet	Int8/Int4	128x128x3	STM32N6570-DK	NPU/MCU	4.05	246.91	3.0.0
MobileNet v2 0.35	imagenet	Int8	224x224x3	STM32N6570-DK	NPU/MCU	8.68	115.2	3.0.0
MobileNet v2 0.35	imagenet	Int8/Int4	224x224x3	STM32N6570-DK	NPU/MCU	6.83	146.4	3.0.0
MobileNet v2 1.0	imagenet	Int8	224x224x3	STM32N6570-DK	NPU/MCU	20.45	48.9	3.0.0
MobileNet v2 1.0	imagenet	Int8/Int4	224x224x3	STM32N6570-DK	NPU/MCU	18.21	54.91	3.0.0
MobileNet v2 1.4	imagenet	Int8	224x224x3	STM32N6570-DK	NPU/MCU	34.74	28.79	3.0.0
MobileNet v2 1.4	imagenet	Int8/Int4	224x224x3	STM32N6570-DK	NPU/MCU	31.94	31.3	3.0.0

Reference MCU memory footprint based on Flowers and imagenet dataset (see Accuracy for details on dataset)

Model	Dataset	Format	Resolution	Series	Activation RAM	Runtime RAM	Weights Flash	Code Flash	Total RAM	Total Flash	STEdgeAI Core version
MobileNet v2 0.35 fft	Flowers	Int8	128x128x3	STM32H7	237.32 KiB	3.77 KiB	406.86 KiB	64.3 KiB	241.09 KiB	471.16 KiB	3.0.0
MobileNet v2 0.35 fft	Flowers	Int8	224x224x3	STM32H7	699.32 KiB	3.77 KiB	406.86 KiB	64.69 KiB	703.09 KiB	471.55 KiB	3.0.0
MobileNet v2 0.35	imagenet	Int8	128x128x3	STM32H7	237.32 KiB	3.36 KiB	1654.5 KiB KiB	65.25 KiB	240.68 KiB	1719.75 KiB	3.0.0
MobileNet v2 0.35	imagenet	Int8	224x224x3	STM32H7	699.32 KiB	3.36 KiB	1654.5 KiB	65.68 KiB	702.68 KiB	1720.18 KiB	3.0.0
MobileNet v2 1.0	imagenet	Int8	224x224x3	STM32H7	1433.13 KiB	3.36 KiB	3458.97 KiB	104.92 KiB	1436.49 KiB	3563.89 KiB	3.0.0
MobileNet v2 1.4	imagenet	Int8	224x224x3	STM32H7	2143.27 KiB	3.36 KiB	6015.34 KiB	132.17 KiB	2146.63 KiB	6147.51 KiB	3.0.0

Reference MCU inference time based on Flowers and imagenet dataset (see Accuracy for details on dataset)

Model	Dataset	Format	Resolution	Board	Execution Engine	Frequency	Inference time (ms)	STEdgeAI Core version
MobileNet v2 0.35 fft	Flowers	Int8	128x128x3	STM32H747I-DISCO	1 CPU	400 MHz	100.09 ms	3.0.0
MobileNet v2 0.35 fft	Flowers	Int8	224x224x3	STM32H747I-DISCO	1 CPU	400 MHz	308.57 ms	3.0.0
MobileNet v2 0.35	imagenet	Int8	128x128x3	STM32H747I-DISCO	1 CPU	400 MHz	113.43 ms	3.0.0
MobileNet v2 0.35	imagenet	Int8	224x224x3	STM32H747I-DISCO	1 CPU	400 MHz	321.76 ms	3.0.0
MobileNet v2 1.0	imagenet	Int8	224x224x3	STM32H747I-DISCO	1 CPU	400 MHz	1118.27 ms	3.0.0
MobileNet v2 1.4	imagenet	Int8	224x224x3	STM32H747I-DISCO	1 CPU	400 MHz	2035.56 ms	3.0.0

Reference MPU inference time based on Flowers and imagenet dataset (see Accuracy for details on dataset)

Model	Dataset	Format	Resolution	Quantization	Board	Execution Engine	Frequency	Inference time (ms)	%NPU	%GPU	%CPU	X-LINUX-AI version	Framework
MobileNet v2 1.0_per_tensor	imagenet	Int8	224x224x3	per-tensor	STM32MP257F-DK2	NPU/GPU	800 MHz	12.15	81.71	18.29	0	v6.1.0	OpenVX
MobileNet v2 1.0	imagenet	Int8	224x224x3	per-channel **	STM32MP257F-DK2	NPU/GPU	800 MHz	75.91	2.77	97.23	0	v6.1.0	OpenVX
MobileNet v2 0.35 fft	Flowers	Int8	224x224x3	per-channel **	STM32MP257F-DK2	NPU/GPU	800 MHz	25.30	3.89	96.11	0	v6.1.0	OpenVX
MobileNet v2 0.35 fft	Flowers	Int8	128x128x3	per-channel **	STM32MP257F-DK2	NPU/GPU	800 MHz	8.97	11.73	88.27	0	v6.1.0	OpenVX
MobileNet v2 1.0_per_tensor	imagenet	Int8	224x224x3	per-tensor	STM32MP157F-DK2	2 CPU	800 MHz	346.87	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v2 1.0	imagenet	Int8	224x224x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	206.64	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v2 0.35 fft	Flowers	Int8	224x224x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	51.33	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v2 0.35 fft	Flowers	Int8	128x128x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	16.27	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v2 1.0_per_tensor	imagenet	Int8	224x224x3	per-tensor	STM32MP135F-DK2	1 CPU	1000 MHz	434.12	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v2 1.0	imagenet	Int8	224x224x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	316.76	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v2 0.35 fft	Flowers	Int8	224x224x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	81.91	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v2 0.35 fft	Flowers	Int8	128x128x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	25.75	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0

** To get the most out of MP25 NPU hardware acceleration, please use per-tensor quantization

** Note: On STM32MP2 devices, per-channel quantized models are internally converted to per-tensor quantization by the compiler using an entropy-based method. This may introduce a slight loss in accuracy compared to the original per-channel models.

Accuracy with Flowers dataset

Dataset details: link , License CC BY 2.0 , Quotation[1] , Number of classes: 5, Number of images: 3 670

Model	Format	Resolution	Top 1 Accuracy
MobileNet v2 0.35 fft	Float	128x128x3	91.83 %
MobileNet v2 0.35 fft	Int8	128x128x3	91.01 %
MobileNet v2 0.35 fft	Float	224x224x3	93.6 %
MobileNet v2 0.35 fft	Int8	224x224x3	92.78 %

Accuracy with Plant-village dataset

Dataset details: link , License CC0 1.0, Quotation[2] , Number of classes: 39, Number of images: 61 486

Model	Format	Resolution	Top 1 Accuracy
MobileNet v2 0.35 fft	Float	128x128x3	99.77 %
MobileNet v2 0.35 fft	Int8	128x128x3	99.48 %
MobileNet v2 0.35 fft	Float	224x224x3	99.95 %
MobileNet v2 0.35 fft	Int8	224x224x3	99.68 %

Accuracy with Food-101 dataset

Dataset details: link, Quotation[3] , Number of classes: 101 , Number of images: 101 000

Model	Format	Resolution	Top 1 Accuracy
MobileNet v2 0.35 fft	Float	128x128x3	65.88 %
MobileNet v2 0.35 fft	Int8	128x128x3	65 %
MobileNet v2 0.35 fft	Int8/Int4	128x128x3	64.61 %
MobileNet v2 0.35 fft	Float	224x224x3	76.47 %
MobileNet v2 0.35 fft	Int8	224x224x3	75.4 %
MobileNet v2 0.35 fft	Int8/Int4	224x224x3	74.86 %
MobileNet v2 1.0 fft	Float	224x224x3	82.13 %
MobileNet v2 1.0 fft	Int8	224x224x3	81.6 %
MobileNet v2 1.0 fft	Int8/Int4	224x224x3	80.06 %

Accuracy with coco_person dataset

The coco_person dataset is derived from COCO-2014 and created using the script here (link). The dataset folder has 2 sub-folders — person and not person containing images of the respective types Dataset details: link , License Creative Commons Attribution 4.0, Quotation[3] , Number of classes: 2 , Number of images: 84810

Model	Format	Resolution	Top 1 Accuracy
MobileNet v2 0.35 fft	Float	128x128x3	95.37 %
MobileNet v2 0.35 fft	Int8	128x128x3	94.95 %

Accuracy with imagenet

Dataset details: link, Quotation[4]. Number of classes: 1000. To perform the quantization, we calibrated the activations with a random subset of the training set. For the sake of simplicity, the accuracy reported here was estimated on the 50000 labelled images of the validation set.

Model	Format	Resolution	Top 1 Accuracy
MobileNet v2 0.35	Float	128x128x3	46.96 %
MobileNet v2 0.35	Int8	128x128x3	43.94 %
MobileNet v2 0.35	Int8/Int4	128x128x3	43.53 %
MobileNet v2 0.35	Float	224x224x3	58.13 %
MobileNet v2 0.35	Int8	224x224x3	56.77 %
MobileNet v2 0.35	Int8/Int4	224x224x3	56.25 %
MobileNet v2 1.0	Float	224x224x3	70.37 %
MobileNet v2 1.0	Int8	224x224x3	69.75 %
MobileNet v2 1.0	Int8/Int4	224x224x3	69.54 %
MobileNet v2 1.0_per_tensor	Int8	224x224x3	65.84 %
MobileNet v2 1.4	Float	224x224x3	73.74 %
MobileNet v2 1.4	Int8	224x224x3	73.45 %
MobileNet v2 1.4	Int8/Int4	224x224x3	73.12 %

Retraining and Integration in a simple example:

Please refer to the stm32ai-modelzoo-services GitHub here

References

[1] "Tf_flowers : tensorflow datasets," TensorFlow. [Online]. Available: https://www.tensorflow.org/datasets/catalog/tf_flowers.

[2] J, ARUN PANDIAN; GOPAL, GEETHARAMANI (2019), "Data for: Identification of Plant Leaf Diseases Using a 9-layer Deep Convolutional Neural Network", Mendeley Data, V1, doi: 10.17632/tywbtsjrjv.1

[3] L. Bossard, M. Guillaumin, and L. Van Gool, "Food-101 -- Mining Discriminative Components with Random Forests." European Conference on Computer Vision, 2014.

[4] Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) imagenet Large Scale Visual Recognition Challenge.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for STMicroelectronics/mobilenetv2

Finetunes

1 model

Paper for STMicroelectronics/mobilenetv2

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Paper • 1801.04381 • Published Jan 13, 2018 • 1