tlemagueresse
commited on
Commit
·
275d8d6
1
Parent(s):
a65ba26
First commit with challenge files
Browse files- LICENSE.md +0 -0
- README.md +89 -0
- fast_model.py +434 -0
- model/features.json +13 -0
- model/lgbm_params.json +12 -0
- model/model.txt +0 -0
- notebooks/EDA.ipynb +0 -0
- notebooks/Model_Exploration.ipynb +0 -0
- requirements.txt +0 -0
LICENSE.md
ADDED
File without changes
|
README.md
CHANGED
@@ -1,3 +1,92 @@
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
4 |
+
# Quefrency Guardian: Chainsaw Noise Detector
|
5 |
+
|
6 |
+
An efficient model to detect chainsaw activity in forest soundscapes using spectral and cepstral audio features. The model is designed for environmental conservation and is based on a LightGBM classifier, capable of low-energy inference on both CPU and GPU devices. This repository provides the complete code and configuration for feature extraction, model implementation, and deployment.
|
7 |
+
|
8 |
+
## Installation
|
9 |
+
|
10 |
+
To use the model, clone the repository and install the dependencies:
|
11 |
+
|
12 |
+
```bash
|
13 |
+
git clone https://huggingface.co/your_username/your_model_name
|
14 |
+
cd your_model_name
|
15 |
+
pip install -r requirements.txt
|
16 |
+
```
|
17 |
+
|
18 |
+
## Model Overview
|
19 |
+
|
20 |
+
### Features
|
21 |
+
|
22 |
+
The model uses:
|
23 |
+
- **Spectrogram Features**: Extracted from frequencies between 70-1525 Hz.
|
24 |
+
- **Cepstral Features**: Calculated as the FFT of the log spectrogram in a filtered quefrency range.
|
25 |
+
- **Time Averaging**: Both feature sets are averaged over the time domain for robustness in noisy settings.
|
26 |
+
|
27 |
+
### LightGBM Model
|
28 |
+
|
29 |
+
The model is a **binary classifier** (chainsaw vs environment) trained on the `rfcx/frugalai` dataset. Key model parameters are included in `model/lgbm_params.json`.
|
30 |
+
|
31 |
+
## Usage
|
32 |
+
|
33 |
+
### Load the Model and Parameters
|
34 |
+
|
35 |
+
```python
|
36 |
+
import json
|
37 |
+
from fast_model import FastModel
|
38 |
+
|
39 |
+
# Load parameters
|
40 |
+
with open("model/features.json", "r") as f:
|
41 |
+
features = json.load(f)
|
42 |
+
|
43 |
+
with open("model/lgbm_params.json", "r") as f:
|
44 |
+
lgbm_params = json.load(f)
|
45 |
+
|
46 |
+
# Initialize the model
|
47 |
+
model = FastModel(
|
48 |
+
feature_params=features,
|
49 |
+
lgbm_params=lgbm_params,
|
50 |
+
model_file="model/model.txt", # Path to the serialized model file
|
51 |
+
device="cuda" # Use 'cpu' if GPU is unavailable
|
52 |
+
)
|
53 |
+
|
54 |
+
# Predict on a Dataset
|
55 |
+
from datasets import load_dataset
|
56 |
+
dataset = load_dataset("rfcx/frugalai")
|
57 |
+
predictions = model.predict(dataset["test"])
|
58 |
+
print(predictions)
|
59 |
+
```
|
60 |
+
|
61 |
+
### Performance
|
62 |
+
|
63 |
+
- **Accuracy**: 95% on the test set with a 4.5% FPR at the default threshold.
|
64 |
+
- **Low-Energy Mode**: Using only 1 second of audio inference reduces energy consumption by 50%, at the cost of ~1% accuracy.
|
65 |
+
- **Environmental Impact**: Inference energy consumption is **0.21 Wh**, tracked using CodeCarbon.
|
66 |
+
|
67 |
+
### License
|
68 |
+
|
69 |
+
This project is licensed under the [Creative Commons Attribution Non-Commercial 4.0 International](https://creativecommons.org/licenses/by-nc/4.0/). You are free to share and adapt the work for non-commercial purposes, provided attribution is given.
|
70 |
+
|
71 |
+
---
|
72 |
+
|
73 |
+
## File Structure
|
74 |
+
📂 your_model_name/ ├── 📂 model/ │ ├── model.txt # Pre-trained LightGBM model │ ├── features.json # Feature extraction parameters │ ├── lgbm_params.json # LightGBM parameters ├── 📜 README.md # Documentation ├── 📜 LICENSE.md # CC BY-NC 4.0 license ├── 📜 requirements.txt # Python dependencies └── 📜 fast_model.py # Model implementation
|
75 |
+
|
76 |
+
|
77 |
+
## Dataset
|
78 |
+
|
79 |
+
The model was trained and evaluated on the [Rainforest Connection (RFCx) Frugal AI](https://huggingface.co/datasets/rfcx/frugalai) dataset.
|
80 |
+
|
81 |
+
#### Labels:
|
82 |
+
- `0`: Chainsaw
|
83 |
+
- `1`: Environment
|
84 |
+
|
85 |
+
## Limitations
|
86 |
+
|
87 |
+
- **Audio Length**: The classifier is designed for 1 to 3 seconds of audio sampled at either 12 kHz or 24 kHz.
|
88 |
+
- **Environmental Noise**: The model might misclassify if recordings are noisy or machinery similar to chainsaws is present.
|
89 |
+
|
90 |
+
---
|
91 |
+
|
92 |
+
This README serves as the primary documentation for Hugging Face and provides an overview of the model's purpose, data requirements, and usage.
|
fast_model.py
ADDED
@@ -0,0 +1,434 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import struct
|
3 |
+
import pickle
|
4 |
+
|
5 |
+
import numpy as np
|
6 |
+
import torch
|
7 |
+
import lightgbm as lgb
|
8 |
+
import torchaudio
|
9 |
+
from sklearn.exceptions import NotFittedError
|
10 |
+
from torchaudio.transforms import Spectrogram
|
11 |
+
import torch.nn.functional as F
|
12 |
+
from datasets.formatting import query_table
|
13 |
+
import warnings
|
14 |
+
|
15 |
+
warnings.filterwarnings("ignore")
|
16 |
+
|
17 |
+
SR = 12000
|
18 |
+
|
19 |
+
|
20 |
+
class FastModel:
|
21 |
+
"""
|
22 |
+
A class designed for training and predicting using LightGBM, incorporating spectral and cepstral features.
|
23 |
+
|
24 |
+
### Workflow:
|
25 |
+
1. Batch Loading and Decoding:
|
26 |
+
Load audio data in batches directly from a table and decode byte-encoded information.
|
27 |
+
|
28 |
+
2. Processing Audio:
|
29 |
+
- Resampling, Padding, or Truncating:
|
30 |
+
Adjust audio durations by padding, cutting, or resampling as needed.
|
31 |
+
- Spectral and Cepstral Feature Extraction:
|
32 |
+
- Compute the spectrogram for audio signals.
|
33 |
+
- Focus on a selected frequency range (~50-1500 Hz) to derive the cepstrum, calculated as the FFT of the logarithm of the spectrogram.
|
34 |
+
- Average both spectrogram and cepstral features over the time axis and combine them into a unified feature vector.
|
35 |
+
|
36 |
+
3. Model Application:
|
37 |
+
Use the extracted features as input for the LightGBM model to perform predictions.
|
38 |
+
|
39 |
+
### Options for Energy Optimization:
|
40 |
+
- Feature Selection:
|
41 |
+
Mask less significant features to reduce computation.
|
42 |
+
- Signal Truncation:
|
43 |
+
Process only a limited duration (e.g., a few seconds) of the audio signal.
|
44 |
+
- Hardware Acceleration:
|
45 |
+
Utilize CUDA to speed up feature computation when supported.
|
46 |
+
|
47 |
+
Attributes
|
48 |
+
----------
|
49 |
+
feature_params : dict
|
50 |
+
Parameters for configuring the MelSpectrogram transformation during training.
|
51 |
+
lgbm_params : dict, optional
|
52 |
+
Parameters for configuring the LightGBM model.
|
53 |
+
model_file : str
|
54 |
+
Path for saving or loading the trained LightGBM model.
|
55 |
+
padding_method : str
|
56 |
+
Padding method to apply when the waveform size is smaller than the desired size.
|
57 |
+
waveform_duration : float
|
58 |
+
Duration of the audio waveform to process, in seconds.
|
59 |
+
mask_features : bool
|
60 |
+
Whether to enable feature masking for dimensionality reduction.
|
61 |
+
mask_file : str
|
62 |
+
Path to save or load the feature mask file.
|
63 |
+
mask_ratio : float
|
64 |
+
The ratio of features to retain when feature masking is applied.
|
65 |
+
batch_size : int
|
66 |
+
Number of samples per batch during training and prediction.
|
67 |
+
apply_offset_on_fit : bool
|
68 |
+
Whether to apply the offset on fit. Useful if waveform_duration is below than 3 seconds.
|
69 |
+
device : str
|
70 |
+
Device used for computation ("cpu" or "cuda").
|
71 |
+
|
72 |
+
Methods
|
73 |
+
-------
|
74 |
+
_save_feature_mask(model, n_features, ratio):
|
75 |
+
Saves the most important features as a mask.
|
76 |
+
_load_feature_mask():
|
77 |
+
Loads the feature mask from the saved file.
|
78 |
+
fit(dataset):
|
79 |
+
Trains the LightGBM model on audio features extracted from the dataset.
|
80 |
+
predict(dataset, get_proba=False):
|
81 |
+
Predicts labels or probabilities for a dataset using the trained model.
|
82 |
+
get_features(audios, spectrogram_transformer, cepstral_transformer):
|
83 |
+
Extracts features from raw audio using spectrogram and cepstral transformations.
|
84 |
+
"""
|
85 |
+
|
86 |
+
def __init__(
|
87 |
+
self,
|
88 |
+
feature_params,
|
89 |
+
lgbm_params=None,
|
90 |
+
padding_method="zero",
|
91 |
+
waveform_duration=3,
|
92 |
+
model_file=None,
|
93 |
+
mask_features=False,
|
94 |
+
mask_file="feature_mask.pkl",
|
95 |
+
mask_ratio=0.25,
|
96 |
+
batch_size=5000,
|
97 |
+
apply_offset_on_fit=True,
|
98 |
+
device="cpu",
|
99 |
+
):
|
100 |
+
self.feature_params = feature_params
|
101 |
+
self.lgbm_params = lgbm_params
|
102 |
+
self.model_file = model_file
|
103 |
+
self.padding_method = padding_method
|
104 |
+
self.waveform_duration = waveform_duration
|
105 |
+
self.mask_features = mask_features
|
106 |
+
self.mask_file = mask_file
|
107 |
+
self.mask_ratio = mask_ratio
|
108 |
+
self.batch_size = batch_size
|
109 |
+
self.apply_offset_on_fit = apply_offset_on_fit
|
110 |
+
self.device = torch.device(
|
111 |
+
"cuda" if device == "cuda" and torch.cuda.is_available() else "cpu"
|
112 |
+
)
|
113 |
+
self.spectrogram_transformer = Spectrogram(
|
114 |
+
n_fft=self.feature_params["n_fft"],
|
115 |
+
hop_length=self.feature_params["hop_length"],
|
116 |
+
pad=self.feature_params["pad"],
|
117 |
+
window_fn=self.feature_params["win_spectrogram"],
|
118 |
+
power=self.feature_params["power"],
|
119 |
+
pad_mode=self.feature_params["pad_mode"],
|
120 |
+
onesided=True,
|
121 |
+
center=False,
|
122 |
+
).to(self.device)
|
123 |
+
self.f = torch.fft.rfftfreq(self.feature_params["n_fft"], d=1.0 / SR)
|
124 |
+
self.ind_f_filtered = torch.tensor(
|
125 |
+
(self.f > self.feature_params["f_min"]) & (self.f < self.feature_params["f_max"]),
|
126 |
+
device=self.device,
|
127 |
+
)
|
128 |
+
self.n_fft_cepstral = self.ind_f_filtered.sum()
|
129 |
+
self.cepstral_transformer = Spectrogram(
|
130 |
+
n_fft=self.n_fft_cepstral,
|
131 |
+
hop_length=self.n_fft_cepstral,
|
132 |
+
pad=0,
|
133 |
+
window_fn=self.feature_params["win_cepstral"],
|
134 |
+
power=self.feature_params["power"],
|
135 |
+
pad_mode=self.feature_params["pad_mode"],
|
136 |
+
onesided=True,
|
137 |
+
center=False,
|
138 |
+
).to(self.device)
|
139 |
+
self.cf = torch.fft.rfftfreq(self.n_fft_cepstral, d=0.5)
|
140 |
+
self.ind_cf_filtered = torch.tensor(
|
141 |
+
(self.cf > self.feature_params["fc_min"]) & (self.cf < self.feature_params["fc_max"]),
|
142 |
+
device=self.device,
|
143 |
+
)
|
144 |
+
|
145 |
+
def _save_feature_mask(self, model, n_features, ratio):
|
146 |
+
feature_importance = model.feature_importance(importance_type="gain")
|
147 |
+
sorted_indices = np.argsort(feature_importance)[::-1]
|
148 |
+
top_indices = sorted_indices[: max(1, int(n_features * ratio))]
|
149 |
+
mask = np.zeros(n_features, dtype=bool)
|
150 |
+
mask[top_indices] = True
|
151 |
+
with open(self.mask_file, "wb") as f:
|
152 |
+
pickle.dump(mask, f)
|
153 |
+
|
154 |
+
def _load_feature_mask(self):
|
155 |
+
with open(self.mask_file, "rb") as f:
|
156 |
+
return pickle.load(f)
|
157 |
+
|
158 |
+
def fit(self, dataset):
|
159 |
+
"""
|
160 |
+
Trains a LightGBM model on features extracted from the dataset.
|
161 |
+
|
162 |
+
Parameters
|
163 |
+
----------
|
164 |
+
dataset : Dataset
|
165 |
+
Dataset object containing audio samples and their corresponding labels.
|
166 |
+
|
167 |
+
Raises
|
168 |
+
------
|
169 |
+
ValueError
|
170 |
+
If the dataset is empty or invalid.
|
171 |
+
"""
|
172 |
+
features, labels = [], []
|
173 |
+
offsets = [0, 12000, 24000] if self.apply_offset_on_fit else [0]
|
174 |
+
for offset in offsets:
|
175 |
+
for audio, label in batch_audio_loader(
|
176 |
+
dataset,
|
177 |
+
waveform_duration=self.waveform_duration,
|
178 |
+
batch_size=self.batch_size,
|
179 |
+
padding_method=self.padding_method,
|
180 |
+
offset=offset,
|
181 |
+
):
|
182 |
+
feature = self.get_features(
|
183 |
+
audio, self.spectrogram_transformer, self.cepstral_transformer
|
184 |
+
)
|
185 |
+
features.append(feature)
|
186 |
+
labels.extend(label)
|
187 |
+
x_train = torch.cat(features, dim=0)
|
188 |
+
train_data = lgb.Dataset(x_train.cpu(), label=labels)
|
189 |
+
model = lgb.train(self.lgbm_params, train_data)
|
190 |
+
|
191 |
+
if self.mask_features:
|
192 |
+
self._save_feature_mask(model, x_train.shape[1], self.mask_ratio)
|
193 |
+
mask = self._load_feature_mask()
|
194 |
+
x_train = x_train[:, mask]
|
195 |
+
train_data = lgb.Dataset(x_train.cpu(), label=labels)
|
196 |
+
model = lgb.train(self.lgbm_params, train_data)
|
197 |
+
|
198 |
+
model.save_model(self.model_file)
|
199 |
+
|
200 |
+
def predict(self, dataset, get_proba=False):
|
201 |
+
"""
|
202 |
+
Predicts labels or probabilities for a dataset using the trained model.
|
203 |
+
|
204 |
+
Parameters
|
205 |
+
----------
|
206 |
+
dataset : Dataset
|
207 |
+
The dataset containing audio data for prediction.
|
208 |
+
get_proba : bool, optional
|
209 |
+
If True, returns class probabilities rather than binary predictions (default is False).
|
210 |
+
|
211 |
+
Returns
|
212 |
+
-------
|
213 |
+
numpy.ndarray
|
214 |
+
If `get_proba` is True, returns a 1D array of class probabilities.
|
215 |
+
If `get_proba` is False, returns a 1D array of binary predictions (0 or 1).
|
216 |
+
|
217 |
+
Raises
|
218 |
+
------
|
219 |
+
NotFittedError
|
220 |
+
If the model is not yet trained.
|
221 |
+
FileNotFoundError
|
222 |
+
If the model file does not exist.
|
223 |
+
"""
|
224 |
+
if not self.model_file:
|
225 |
+
raise NotFittedError("The model is not trained yet. Train using the `fit` method.")
|
226 |
+
if not os.path.isfile(self.model_file):
|
227 |
+
raise FileNotFoundError(f"Model file {self.model_file} not found.")
|
228 |
+
|
229 |
+
features = []
|
230 |
+
for audio, _ in batch_audio_loader(
|
231 |
+
dataset,
|
232 |
+
waveform_duration=self.waveform_duration,
|
233 |
+
batch_size=self.batch_size,
|
234 |
+
padding_method=self.padding_method,
|
235 |
+
):
|
236 |
+
feature = self.get_features(
|
237 |
+
audio, self.spectrogram_transformer, self.cepstral_transformer
|
238 |
+
)
|
239 |
+
features.append(feature)
|
240 |
+
features = torch.cat(features, dim=0)
|
241 |
+
torch.cuda.empty_cache()
|
242 |
+
|
243 |
+
if self.mask_features:
|
244 |
+
mask = self._load_feature_mask()
|
245 |
+
features = features[:, mask]
|
246 |
+
|
247 |
+
model = lgb.Booster(model_file=self.model_file)
|
248 |
+
y_score = model.predict(features.cpu())
|
249 |
+
|
250 |
+
return y_score if get_proba else (y_score >= 0.5).astype(int)
|
251 |
+
|
252 |
+
def get_features(self, audios, spectrogram_transformer, cepstral_transformer):
|
253 |
+
"""
|
254 |
+
Extracts features from raw audio using spectrogram and cepstrum transformations.
|
255 |
+
|
256 |
+
Parameters
|
257 |
+
----------
|
258 |
+
audios : torch.Tensor
|
259 |
+
A batch of audio waveforms as 1D tensors.
|
260 |
+
spectrogram_transformer : Spectrogram
|
261 |
+
Transformation used to compute MelSpectrogram features.
|
262 |
+
cepstral_transformer : Spectrogram
|
263 |
+
Transformation used to compute cepstral features.
|
264 |
+
|
265 |
+
Returns
|
266 |
+
-------
|
267 |
+
torch.Tensor
|
268 |
+
Extracted features for the audio batch. Includes both cepstral and log-scaled spectrogram features.
|
269 |
+
|
270 |
+
Raises
|
271 |
+
------
|
272 |
+
ValueError
|
273 |
+
If the input audio tensor is empty or invalid.
|
274 |
+
"""
|
275 |
+
audios = audios.to(self.device)
|
276 |
+
sxx = spectrogram_transformer(audios) # shape : (n_audios, n_f, n_blocks)
|
277 |
+
sxx = torch.log10(torch.clamp(sxx.permute(0, 2, 1), min=1e-10))
|
278 |
+
cepstral_mat = cepstral_transformer(sxx[:, :, self.ind_f_filtered]).squeeze(dim=3)[
|
279 |
+
:, :, self.ind_cf_filtered
|
280 |
+
]
|
281 |
+
|
282 |
+
return torch.cat(
|
283 |
+
[
|
284 |
+
cepstral_mat.mean(dim=1),
|
285 |
+
sxx.mean(dim=1),
|
286 |
+
],
|
287 |
+
dim=1,
|
288 |
+
)
|
289 |
+
|
290 |
+
|
291 |
+
def batch_audio_loader(
|
292 |
+
dataset,
|
293 |
+
waveform_duration=3,
|
294 |
+
batch_size=1,
|
295 |
+
sr=12000,
|
296 |
+
device="cpu",
|
297 |
+
padding_method=None,
|
298 |
+
offset=0,
|
299 |
+
):
|
300 |
+
"""
|
301 |
+
Loads and preprocesses audio data from a dataset for training or inference in batches.
|
302 |
+
|
303 |
+
Parameters
|
304 |
+
----------
|
305 |
+
dataset : Dataset
|
306 |
+
The dataset containing audio samples and labels.
|
307 |
+
waveform_duration : float, optional
|
308 |
+
Desired duration of the audio waveforms in seconds (default is 3).
|
309 |
+
batch_size : int, optional
|
310 |
+
Number of audio samples per batch (default is 1).
|
311 |
+
sr : int, optional
|
312 |
+
Target sampling rate for audio processing (default is 12000).
|
313 |
+
device : str, optional
|
314 |
+
Device for processing ("cpu" or "cuda") (default is "cpu").
|
315 |
+
padding_method : str, optional
|
316 |
+
Method to pad audio waveforms smaller than the desired size (e.g., "zero", "reflect").
|
317 |
+
offset : int, optional
|
318 |
+
Number of samples to skip before processing the first audio sample (default is 0).
|
319 |
+
|
320 |
+
Yields
|
321 |
+
------
|
322 |
+
tuple
|
323 |
+
A tuple (batch_audios, batch_labels), where:
|
324 |
+
- batch_audios is a tensor of processed audio waveforms.
|
325 |
+
- batch_labels is a tensor of corresponding audio labels.
|
326 |
+
|
327 |
+
Raises
|
328 |
+
------
|
329 |
+
ValueError
|
330 |
+
If an unsupported sampling rate is encountered in the dataset.
|
331 |
+
"""
|
332 |
+
|
333 |
+
def process_resampling(resample_buffer, resample_indices, batch_audios, sr, target_sr):
|
334 |
+
if resample_buffer:
|
335 |
+
resampler = torchaudio.transforms.Resample(
|
336 |
+
orig_freq=sr, new_freq=target_sr, lowpass_filter_width=6
|
337 |
+
)
|
338 |
+
resampled = resampler(torch.stack(resample_buffer))
|
339 |
+
for idx, original_idx in enumerate(resample_indices):
|
340 |
+
batch_audios[original_idx] = resampled[idx]
|
341 |
+
|
342 |
+
device = torch.device("cuda" if device == "cuda" and torch.cuda.is_available() else "cpu")
|
343 |
+
batch_audios, batch_labels = [], []
|
344 |
+
resample_24000, resample_24000_indices = [], []
|
345 |
+
|
346 |
+
for i in range(len(dataset)):
|
347 |
+
pa_subtable = query_table(dataset._data, i, indices=dataset._indices)
|
348 |
+
wav_bytes = pa_subtable[0][0][0].as_py()
|
349 |
+
sampling_rate = struct.unpack("<I", wav_bytes[24:28])[0]
|
350 |
+
|
351 |
+
if sampling_rate not in [sr, sr * 2]:
|
352 |
+
raise ValueError(
|
353 |
+
f"Unsupported sampling rate: {sampling_rate}Hz. Only {sr}Hz and {sr * 2}Hz are allowed."
|
354 |
+
)
|
355 |
+
|
356 |
+
data_size = struct.unpack("<I", wav_bytes[40:44])[0] // 2
|
357 |
+
if data_size == 0:
|
358 |
+
batch_audios.append(torch.zeros(int(waveform_duration * SR)))
|
359 |
+
else:
|
360 |
+
try:
|
361 |
+
waveform = (
|
362 |
+
torch.frombuffer(wav_bytes[44:], dtype=torch.int16, offset=offset)[
|
363 |
+
: int(waveform_duration * sampling_rate)
|
364 |
+
].float()
|
365 |
+
/ 32767
|
366 |
+
)
|
367 |
+
except Exception as e:
|
368 |
+
continue # May append during fit for small audios. offset is set to 0 during predict.
|
369 |
+
waveform = apply_padding(
|
370 |
+
waveform, int(waveform_duration * sampling_rate), padding_method
|
371 |
+
)
|
372 |
+
|
373 |
+
if sampling_rate == sr:
|
374 |
+
batch_audios.append(waveform)
|
375 |
+
elif sampling_rate == 2 * sr:
|
376 |
+
resample_24000.append(waveform)
|
377 |
+
resample_24000_indices.append(len(batch_audios))
|
378 |
+
batch_audios.append(None)
|
379 |
+
|
380 |
+
batch_labels.append(pa_subtable[1][0].as_py())
|
381 |
+
|
382 |
+
if len(batch_audios) == batch_size:
|
383 |
+
# Perform resampling once and take advantage of Torch's vectorization capabilities.
|
384 |
+
process_resampling(resample_24000, resample_24000_indices, batch_audios, sr * 2, SR)
|
385 |
+
|
386 |
+
batch_audios_on_device = torch.stack(batch_audios).to(device)
|
387 |
+
batch_labels_on_device = torch.tensor(batch_labels).to(device)
|
388 |
+
|
389 |
+
yield batch_audios_on_device, batch_labels_on_device
|
390 |
+
|
391 |
+
batch_audios, batch_labels = [], []
|
392 |
+
resample_24000, resample_24000_indices = [], []
|
393 |
+
|
394 |
+
if batch_audios:
|
395 |
+
process_resampling(resample_24000, resample_24000_indices, batch_audios, sr * 2, SR)
|
396 |
+
batch_audios_on_device = torch.stack(batch_audios).to(device)
|
397 |
+
batch_labels_on_device = torch.tensor(batch_labels).to(device)
|
398 |
+
|
399 |
+
yield batch_audios_on_device, batch_labels_on_device
|
400 |
+
|
401 |
+
|
402 |
+
def apply_padding(waveform, output_size, padding_method="zero"):
|
403 |
+
"""
|
404 |
+
Applies padding to the waveform when its size is smaller than the desired output size.
|
405 |
+
|
406 |
+
Parameters
|
407 |
+
----------
|
408 |
+
waveform : torch.Tensor
|
409 |
+
Input 1D waveform tensor.
|
410 |
+
output_size : int
|
411 |
+
Desired output size after padding or truncation.
|
412 |
+
padding_method : str, default="zero"
|
413 |
+
Padding method to apply.
|
414 |
+
|
415 |
+
Returns
|
416 |
+
-------
|
417 |
+
torch.Tensor
|
418 |
+
Padded or truncated waveform of size `output_size`.
|
419 |
+
"""
|
420 |
+
if waveform.size(0) >= output_size:
|
421 |
+
return waveform[:output_size]
|
422 |
+
|
423 |
+
total_pad = output_size - waveform.size(0)
|
424 |
+
if padding_method == "zero":
|
425 |
+
return F.pad(waveform, (0, total_pad), mode="constant", value=0)
|
426 |
+
if padding_method in ["reflect", "replicate", "circular"]:
|
427 |
+
# Pad not possible if waveform.size(0) < total_pad.
|
428 |
+
if waveform.size(0) < total_pad:
|
429 |
+
num_repeats = (total_pad // waveform.size(0)) + 1
|
430 |
+
waveform = torch.tile(waveform, (num_repeats,))
|
431 |
+
total_pad = output_size - waveform.size(0)
|
432 |
+
|
433 |
+
return F.pad(waveform.unsqueeze(0), (0, total_pad), mode=padding_method).squeeze()
|
434 |
+
raise ValueError(f"Invalid padding method: {padding_method}")
|
model/features.json
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"n_fft": 512,
|
3 |
+
"hop_length": 256,
|
4 |
+
"pad": 0,
|
5 |
+
"win_spectrogram": "Hamming Window",
|
6 |
+
"win_cepstral": "Hamming Window",
|
7 |
+
"power": 2,
|
8 |
+
"pad_mode": "reflect",
|
9 |
+
"f_min": 70,
|
10 |
+
"f_max": 1525,
|
11 |
+
"fc_min": 0.05,
|
12 |
+
"fc_max": 0.8,
|
13 |
+
}
|
model/lgbm_params.json
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"objective": "binary",
|
3 |
+
"metric": "binary_logloss",
|
4 |
+
"boosting_type": "gbdt",
|
5 |
+
"learning_rate": 0.1,
|
6 |
+
"num_leaves": 75,
|
7 |
+
"max_depth": -1,
|
8 |
+
"feature_fraction": 0.8,
|
9 |
+
"bagging_fraction": 0.8,
|
10 |
+
"bagging_freq": 5,
|
11 |
+
"verbosity": -1,
|
12 |
+
}
|
model/model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
notebooks/EDA.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
notebooks/Model_Exploration.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
requirements.txt
ADDED
File without changes
|