GENERATIVE ADVERSARIAL NETWORKS (GANs) IN MULTIMODAL AI USING BRIDGING TEXT, IMAGE, AND AUDIO DATA FOR ENHANCED MODEL PERFORMANCE

ICTACT Journal on Soft Computing ( Volume: 15 , Issue: 3 )

Abstract

The integration of multimodal data is critical in advancing artificial intelligence models capable of interpreting diverse and complex inputs. While standalone models excel in processing individual data types like text, image, or audio, they often fail to achieve comparable performance when these modalities are combined. Generative Adversarial Networks (GANs) have emerged as a transformative approach in this domain due to their ability to synthesize and learn across disparate data types effectively. This study addresses the challenge of bridging multimodal datasets to improve the generalization and performance of AI models. The proposed framework employs a novel GAN architecture that integrates textual, visual, and auditory data streams. Using a shared latent space, the system generates coherent representations for cross-modal understanding, ensuring seamless data fusion. The GAN model is trained on a benchmark dataset comprising 50,000 multimodal instances, with 25% allocated for testing. Results indicate significant improvements in multimodal synthesis and classification accuracy. The model achieves a text-to-image synthesis FID score of 14.7, an audio- to-text BLEU score of 35.2, and a cross-modal classification accuracy of 92.3%. These outcomes surpass existing models by 8-15% across comparable metrics, highlighting the GAN’s effectiveness in handling data heterogeneity. The findings suggest potential applications in areas such as virtual assistants, multimedia analytics, and cross-modal content generation.

Authors

R. Arun Kumar1, C. Lisa2, V.R. Rashmi3, K. Sandhya4
University of South Wales, United Kingdom1, Nehru College of Engineering and Research Centre, India2,3, Malabar College of Engineering and Technology, India4

Keywords

Multimodal AI, Generative Adversarial Networks, Cross-Modal Synthesis, Text-Image-Audio Fusion, Model Performance Enhancement

Published By
ICTACT
Published In
ICTACT Journal on Soft Computing
( Volume: 15 , Issue: 3 )
Date of Publication
January 2025
Pages
3567 - 3577
Page Views
332
Full Text Views
22

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in