|
--- |
|
license: cc-by-nc-4.0 |
|
--- |
|
|
|
### <i>CardioNetV2</i> |
|
**The latest multi-modal model in the Cardio Sonix line. |
|
Built on the basis of models whose architectures were originally |
|
intended for computer vision tasks |
|
(like a modified ResNet) or for NLP (like LSTM). |
|
The model works with audio signal and tabular data. |
|
The model works with the input audio signal as with tokens: |
|
a mel-kesprogram with time samples is extracted from the audio, |
|
where each time sample has N-mel-cepstral coefficients. |
|
At the very beginning, the LSTM takes a mel-cepstrogram as input |
|
and produces an output tensor that goes into ResNet (Residual Neural Network). |
|
ResNet is a modified audio signal processing model from the family of residual networks. |
|
In this implementation, residual blocks with pre-activation were used. |
|
The data then goes to the DenseMixer input. |
|
This model performs inference separately for audio and tabular features, |
|
then concatenates the outputs into a dense feature vector and performs inference on it, |
|
after which we get a prediction based on audio and tabular data** |
|
|
|
![](https://i.ibb.co/gW14Dh2/attached.png) |