ICTACT Journals

MEL-SPECTROGRAM-BASED DEEPFAKE AUDIO DETECTION USING CONVOLUTIONAL NEURAL NETWORKS: A NOVEL APPROACH

ICTACT Journal on Data Science and Machine Learning ( Volume: 5 , Issue: 3 )

Abstract

Artificial intelligence has profoundly transformed how we manipulate various forms of media, including audio, video, images, and text. Among the most impactful applications is the creation of deepfake content, which employs advanced techniques to fabricate convincing simulations of reality. However, researchers have been diligently working on methods to detect and discern deepfake audio, thereby bolstering security in fields such as media forensics and authentication systems. One such method harnesses the power of Mel Spectrograms and Convolutional Neural Networks (CNNs). Mel Spectrograms offer visual representations of audio signals, illustrating frequency components over time. Through the analysis of these spectrograms, CNNs can be trained to recognize patterns and irregularities indicative of artificial alterations in audio content. To develop an effective deepfake detection system, researchers have utilized the Fake-or-Real dataset, which comprises a mixture of authentic and deepfake audio samples. This dataset is segmented into sub-datasets based on audio length and bit rate, ensuring a diverse array of samples for comprehensive model training. The CNN model, once trained, demonstrates high accuracy in distinguishing between genuine and deepfake audio by identifying subtle discrepancies or abnormalities introduced by deepfake generation techniques. These inconsistencies serve as red flags for manipulation, streamlining the process of audio authentication and fortifying audio security measures. By integrating Mel Spectrograms and CNNs, this approach signifies a significant stride in countering the proliferation of deepfake technology. It presents a promising avenue for organizations and individuals seeking to safeguard against misinformation, deceptive recordings, and other forms of audio tampering. Looking ahead, continued research and refinement of these methodologies will undoubtedly reinforce trust and integrity in audio content across diverse domains, fostering a safer and more secure digital landscape.

Authors

G. Fathima, S. Kiruthika, M. Malar, T. Nivethini
Adhiyamaan College of Engineering, India

Keywords

Artificial Intelligence, Deepfake Audio, Mel Spectrogram, Convolutional Neural Networks, Security, Media Forensics

Published By

ICTACT

Published In

ICTACT Journal on Data Science and Machine Learning
( Volume: 5 , Issue: 3 )

Date of Publication

June 2024

Pages

630 - 635

DOI

10.21917/ijdsml.2024.0133