WpythonW commited on
Commit
1910bce
·
verified ·
1 Parent(s): 055c1d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -1
README.md CHANGED
@@ -36,4 +36,102 @@ model-index:
36
  value: 0.9692
37
  - type: recall
38
  value: 0.9728
39
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  value: 0.9692
37
  - type: recall
38
  value: 0.9728
39
+ ---
40
+
41
+ # AST Fine-tuned for Fake Audio Detection
42
+
43
+ This model is a binary classification head fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) for detecting fake/synthetic audio. The original AST (Audio Spectrogram Transformer) classification head was replaced with a binary classification layer optimized for fake audio detection.
44
+
45
+ ## Model Description
46
+
47
+ - **Base Model**: MIT/ast-finetuned-audioset-10-10-0.4593 (AST pretrained on AudioSet)
48
+ - **Task**: Binary classification (fake/real audio detection)
49
+ - **Input**: Audio converted to Mel spectrogram (128 mel bins, 1024 time frames)
50
+ - **Output**: Binary prediction (0: real audio, 1: fake audio)
51
+ - **Training Hardware**: 2x NVIDIA T4 GPUs
52
+
53
+ ## Training Configuration
54
+
55
+ ```python
56
+ {
57
+ 'learning_rate': 1e-5,
58
+ 'weight_decay': 0.01,
59
+ 'n_iterations': 10000,
60
+ 'batch_size': 8,
61
+ 'gradient_accumulation_steps': 8,
62
+ 'validate_every': 500,
63
+ 'val_samples': 5000
64
+ }
65
+ ```
66
+
67
+ ## Dataset Distribution
68
+
69
+ The model was trained on [012shin/fake-audio-detection-augmented](https://huggingface.co/datasets/012shin/fake-audio-detection-augmented) dataset with the following class distribution:
70
+
71
+ ```
72
+ Training Set (80%):
73
+ - Real Audio (0): 43,460 samples (63.69%)
74
+ - Fake Audio (1): 24,776 samples (36.31%)
75
+
76
+ Test Set (20%):
77
+ - Real Audio (0): 10,776 samples (63.17%)
78
+ - Fake Audio (1): 6,284 samples (36.83%)
79
+ ```
80
+
81
+ ## Model Performance
82
+
83
+ Final metrics on validation set:
84
+ - Accuracy: 0.9662 (96.62%)
85
+ - F1 Score: 0.9710 (97.10%)
86
+ - Precision: 0.9692 (96.92%)
87
+ - Recall: 0.9728 (97.28%)
88
+
89
+ ## Usage
90
+
91
+ Here's how to use the model:
92
+
93
+ ```python
94
+ import torch
95
+ import torchaudio
96
+ from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
97
+
98
+ # Load model and processor
99
+ model = AutoModelForAudioClassification.from_pretrained("your-username/ast-fakeaudio-detector")
100
+ processor = AutoFeatureExtractor.from_pretrained("your-username/ast-fakeaudio-detector")
101
+
102
+ # Load and preprocess audio (ensure 16kHz sampling rate)
103
+ audio_path = "path/to/audio.wav"
104
+ waveform, sample_rate = torchaudio.load(audio_path)
105
+ if sample_rate != 16000:
106
+ resampler = torchaudio.transforms.Resample(sample_rate, 16000)
107
+ waveform = resampler(waveform)
108
+
109
+ # Process audio
110
+ inputs = processor(waveform, sampling_rate=16000, return_tensors="pt")
111
+
112
+ # Get prediction
113
+ with torch.no_grad():
114
+ outputs = model(**inputs)
115
+ probabilities = torch.sigmoid(outputs.logits)
116
+ is_fake = probabilities > 0.5
117
+
118
+ print(f"Probability of being fake audio: {probabilities[0][0]:.4f}")
119
+ print(f"Prediction: {'FAKE' if is_fake else 'REAL'} audio")
120
+ ```
121
+
122
+ ## Limitations
123
+
124
+ Important considerations when using this model:
125
+ 1. The model works best with 16kHz audio input
126
+ 2. Performance may vary with different types of audio manipulation not present in training data
127
+ 3. Very short audio clips (<1 second) might not provide reliable results
128
+ 4. The model should not be used as the sole determiner for real/fake audio detection
129
+
130
+ ## Training Details
131
+
132
+ The training process involved:
133
+ 1. Loading the base AST model pretrained on AudioSet
134
+ 2. Replacing the classification head with a binary classifier
135
+ 3. Fine-tuning on the fake audio detection dataset for 10000 iterations
136
+ 4. Using gradient accumulation (8 steps) with batch size 8
137
+ 5. Implementing validation checks every 500 steps