File size: 1,124 Bytes
c16cb03
 
 
dc90ec8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
license: cc-by-nc-4.0
---

### <i>CardioNetV2</i>
**The latest multi-modal model in the Cardio Sonix line. 
Built on the basis of models whose architectures were originally 
intended for computer vision tasks 
(like a modified ResNet) or for NLP (like LSTM). 
The model works with audio signal and tabular data. 
The model works with the input audio signal as with tokens: 
a mel-kesprogram with time samples is extracted from the audio, 
where each time sample has N-mel-cepstral coefficients. 
At the very beginning, the LSTM takes a mel-cepstrogram as input 
and produces an output tensor that goes into ResNet (Residual Neural Network). 
ResNet is a modified audio signal processing model from the family of residual networks. 
In this implementation, residual blocks with pre-activation were used.
The data then goes to the DenseMixer input. 
This model performs inference separately for audio and tabular features, 
then concatenates the outputs into a dense feature vector and performs inference on it, 
after which we get a prediction based on audio and tabular data**

![](https://i.ibb.co/gW14Dh2/attached.png)