Model Card Vuurwerkverkenner

This model is trained by the Netherlands Forensic Institute. It can be used for linking snippets of exploded (heavy) fireworks to the type of firework that they originate from. You may find an application that uses this model at www.vuurwerkverkenner.nl.

Architecture

The classification procedure consists of two components: a model that determines embeddings and a model that produces a classification based on the distances between embeddings. There are two reasons why directly training a classifier is not appropriate for our setting. Firstly, we do not have photos of snippets available for all classes, but only for a smaller subset. Secondly, the set of classes is dynamic and changes quickly over time (as more fireworks are added), and it is not feasible to train a new model each time. Therefore, we require a one-shot model, which we construct as follows.

Embedding model

First, we train an embedder that produces similar embeddings for snippets and wrappers of the same category, and dissimilar embeddings for different categories. The embedding model is based on the Vision Transformer architecture (see arXiv). It has the following specifications:

Model: ViT-B/32
Input: RBG image of 640x640 pixels
Output/embedding layer size: 256
Training loss: TripletSemiHardLoss (see TensorFlow.org) with batch size 10 (2 anchors, 2 positives, 2 negatives)
Fixed learning rate of 0.000015 with Adam optimizer
Epochs: 100

Classification

To be able to link a photo of snippets to a firework category, we construct reference images based on the wrappers in each category (see “data” for a description), which we convert into reference embeddings using the trained embedding model. In the same way, we create an embedding for the snippet photo. To produce a classification, we calculate the L2 distance (normalized between 0-1) between the snippet photo embedding and the reference embeddings of each of the categories. The minimum distance across all reference embeddings for each category is taken as the representative score for that category.

Text filter

Optionally, a text filter is applied on top of the classification that filters the fireworks labels based on the text that is on the snippet. The text on the snippets must be manually entered. All the text fragments that are on the snippet must be present on the label for a match.

Data

The dataset on which the model is trained and evaluated is constructed from fireworks that were investigated in casework at the Netherlands Forensic Institute since 2010, and consists of three parts. The final model (provided here) is trained on all data. For evaluation purposes, we split the data in a train and test set, which we describe under (“evaluation”).

Lab snippets

For 38 categories of fireworks, we have created snippets of their wrappers by exploding these fireworks. These snippets are photographed with a high-quality DSLR camera on a white background, directly from above, with good lighting conditions (hence ‘lab’ snippets). The snippets are then segmented, after which samples are taken of between 1 and 10 snippets. We take 35 samples for each N, so 35 times 1 snippet, 35 times 2 snippets, ... leading to a total of 350 snippet photos per category. Then, in total, the set of lab snippets consists of 350 * 38 = 13.300 images.

Mock-crime scene snippets

For some of the categories, we have created photos that are more realistically taken at a crime scene. As we expect the model to work better when there is less background noise in the image, we have created photos that we believe are reasonably comparable to what may be done in crime scene circumstances. Therefore, we have mostly created images where snippets are laid out on so-called 'DNA blankets', which may be green or blue in appearance but at least produce a somewhat plain background. In total, we have 2489 such photos available, from 7 different categories of fireworks.

Artificial snippets

As the embedding model must produce embeddings for all fireworks categories, and we do not have snippets available for every category, we also create ‘artificial snippets’ by taking random crops from each fireworks wrapper. These artificial snippet photos consist of between 1 and 10 snippets, and we construct 35 images for each wrapper. These images are used to create reference embeddings which are used during classification (as described above).

Evaluation

In evaluating the performance of the model, we consider two factors as important.

The model may encounter photos that are taken in optimal conditions, similar to the lab snippets, or it may encounter photos that are taken in conditions similar to the mock-crime scene snippets.
The model may encounter firework categories for which it has seen (real) snippets during training (‘best case’), or snippets for categories that are not present in the train set (‘worst case').

To capture the difference in performance across conditions, we construct a separate test set for the lab snippets and the mock-crime scene snippets. For the lab snippets, we split the test set into two parts: one for which categories are present in the train set (best-case) and a second part for which they are not (worst-case). As the mock-crime scene dataset only consists of 7 classes, we are unable to construct a worst-case test set – so we only report best-case performance for this dataset. In practice, a drop in performance may of course be expected in the worst-case scenario for (mock-)crime scene snippets. Overall, we find that the model performs very well for classes that are present in the train set, and that the text filter gives a significant boost if this is not the case.

Lab snippets

Metric	Worst-case (no text filter)	Best-case (no text filter)	Worst-case (with text filter)	Best-case (with text filter)
Accuracy @ 1	0.22	0.99	0.63	1.00
Accuracy @ 3	0.31	1.00	0.80	1.00
Accuracy @ 5	0.63	1.00	0.91	1.00
Accuracy @ 10	0.76	1.00	0.92	1.00
Accuracy @ 25	0.86	1.00	0.96	1.00

Mock snippets

Metric	No text filter	With text filter
Accuracy @ 1	0.99	0.99
Accuracy @ 3	0.99	1.00
Accuracy @ 5	0.99	1.00
Accuracy @ 10	1.00	1.00
Accuracy @ 25	1.00	1.00

Note that the final model is trained on all data, so we expect performance to increase somewhat as compared to these metrics.

Limitations

The evaluation results described above may not be representative of real-world performance of the model for several reasons. Firstly, the model was only trained and evaluated on photos of snippets with relatively plain backgrounds and relatively good lighting conditions. This may or may not be feasible in practice. We expect the model to perform better when photos are of better quality, when there are a large number of snippets with distinctive characteristics in the photo, and when there is a lot of text on the snippets (that is entered correctly). The fewer of these requirements are met, the lower we expect the performance of the algorithm to be. Moreover, when the type of firework under investigation is very new or rare, it may not be present in the reference database and such not retrievable by the model.

Using the model

The model is intended to be used with the Vuurwerkverkenner application, which contains code for running the model. The source code for the application may be found on GitHub.

NetherlandsForensicInstitute
/

vuurwerkverkenner

Model Card Vuurwerkverkenner

Architecture

Embedding model

Classification

Text filter

Data

Lab snippets

Mock-crime scene snippets

Artificial snippets

Evaluation

Lab snippets

Mock snippets

Limitations

Using the model

Datasets used to train NetherlandsForensicInstitute/vuurwerkverkenner