HPAI-BSC
/

SuSy

Image Classification

Transformers

vision

synthetic image detection

Inference Endpoints

Model card Files Files and versions Community

dariog commited on Oct 4, 2024

Commit

b42396f

verified ·

1 Parent(s): d5eac90

Removes resnet paper link to avoid confusion.

Browse files

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -72,7 +72,7 @@ SuSy is a Spatial-Based Synthetic Image Detection and Recognition Model, designe
 The model is based on a CNN architecture and is trained using a supervised learning approach. It's design is based on [previous work](https://upcommons.upc.edu/handle/2117/395959), originally intended for video superresolution detection, adapted here for the tasks of synthetic image detection and recognition. The architecture consists of two modules: a feature extractor and a multi-layer perceptron (MLP), as it's quite light weight. SuSy has a total of 12.7M parameters, with the feature extractor accounting for 12.5M parameters and the MLP accounting for the remaining 197K.
-The CNN feature extractor consists of five stages following a [ResNet-18](https://arxiv.org/abs/1512.03385) scheme. The output of each of the blocks is used as input for various bottleneck modules that are arranged in a staircase pattern. The bottleneck modules consist of three 2D convolutional layers. Each level of bottlenecks takes input at a later stage than the previous level, and each bottleneck module takes input from the current stage and, except the first bottleneck of each level, from the previous bottleneck module.
 The outputs of each level of bottlenecks and stage 4 are passed to a 2D adaptative average pooling layer and then concatenated to form the feature map feeding the MLP. The MLP consists of three fully connected layers with 512, 256 and 256 units, respectively. Between each layer, a dropout layer (rate of 0.5) prevents overfitting. The output of the MLP has 6 units, corresponding to the number of classes in the dataset (5 synthetic models and 1 real image class).

 The model is based on a CNN architecture and is trained using a supervised learning approach. It's design is based on [previous work](https://upcommons.upc.edu/handle/2117/395959), originally intended for video superresolution detection, adapted here for the tasks of synthetic image detection and recognition. The architecture consists of two modules: a feature extractor and a multi-layer perceptron (MLP), as it's quite light weight. SuSy has a total of 12.7M parameters, with the feature extractor accounting for 12.5M parameters and the MLP accounting for the remaining 197K.
+The CNN feature extractor consists of five stages following a ResNet-18 scheme. The output of each of the blocks is used as input for various bottleneck modules that are arranged in a staircase pattern. The bottleneck modules consist of three 2D convolutional layers. Each level of bottlenecks takes input at a later stage than the previous level, and each bottleneck module takes input from the current stage and, except the first bottleneck of each level, from the previous bottleneck module.
 The outputs of each level of bottlenecks and stage 4 are passed to a 2D adaptative average pooling layer and then concatenated to form the feature map feeding the MLP. The MLP consists of three fully connected layers with 512, 256 and 256 units, respectively. Between each layer, a dropout layer (rate of 0.5) prevents overfitting. The output of the MLP has 6 units, corresponding to the number of classes in the dataset (5 synthetic models and 1 real image class).