MEL-SPECTROGRAM-BASED DEEPFAKE AUDIO DETECTION USING CONVOLUTIONAL NEURAL NETWORKS: A NOVEL APPROACH

ICTACT Journal on Data Science and Machine Learning ( Volume: 5 , Issue: 3 )

Abstract

Artificial intelligence has profoundly transformed how we manipulate various forms of media, including audio, video, images, and text. Among the most impactful applications is the creation of deepfake content, which employs advanced techniques to fabricate convincing simulations of reality. However, researchers have been diligently working on methods to detect and discern deepfake audio, thereby bolstering security in fields such as media forensics and authentication systems. One such method harnesses the power of Mel Spectrograms and Convolutional Neural Networks (CNNs). Mel Spectrograms offer visual representations of audio signals, illustrating frequency components over time. Through the analysis of these spectrograms, CNNs can be trained to recognize patterns and irregularities indicative of artificial alterations in audio content. To develop an effective deepfake detection system, researchers have utilized the Fake-or-Real dataset, which comprises a mixture of authentic and deepfake audio samples. This dataset is segmented into sub-datasets based on audio length and bit rate, ensuring a diverse array of samples for comprehensive model training. The CNN model, once trained, demonstrates high accuracy in distinguishing between genuine and deepfake audio by identifying subtle discrepancies or abnormalities introduced by deepfake generation techniques. These inconsistencies serve as red flags for manipulation, streamlining the process of audio authentication and fortifying audio security measures. By integrating Mel Spectrograms and CNNs, this approach signifies a significant stride in countering the proliferation of deepfake technology. It presents a promising avenue for organizations and individuals seeking to safeguard against misinformation, deceptive recordings, and other forms of audio tampering. Looking ahead, continued research and refinement of these methodologies will undoubtedly reinforce trust and integrity in audio content across diverse domains, fostering a safer and more secure digital landscape.

Authors

G. Fathima, S. Kiruthika, M. Malar, T. Nivethini
Adhiyamaan College of Engineering, India

Keywords

Artificial Intelligence, Deepfake Audio, Mel Spectrogram, Convolutional Neural Networks, Security, Media Forensics

Published By
ICTACT
Published In
ICTACT Journal on Data Science and Machine Learning
( Volume: 5 , Issue: 3 )
Date of Publication
June 2024
Pages
630 - 635

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in