GENERATIVE ADVERSARIAL NETWORKS (GANs) IN MULTIMODAL AI USING BRIDGING TEXT, IMAGE, AND AUDIO DATA FOR ENHANCED MODEL PERFORMANCE
Abstract
The integration of multimodal data is critical in advancing artificial intelligence models capable of interpreting diverse and complex inputs. While standalone models excel in processing individual data types like text, image, or audio, they often fail to achieve comparable performance when these modalities are combined. Generative Adversarial Networks (GANs) have emerged as a transformative approach in this domain due to their ability to synthesize and learn across disparate data types effectively. This study addresses the challenge of bridging multimodal datasets to improve the generalization and performance of AI models. The proposed framework employs a novel GAN architecture that integrates textual, visual, and auditory data streams. Using a shared latent space, the system generates coherent representations for cross-modal understanding, ensuring seamless data fusion. The GAN model is trained on a benchmark dataset comprising 50,000 multimodal instances, with 25% allocated for testing. Results indicate significant improvements in multimodal synthesis and classification accuracy. The model achieves a text-to-image synthesis FID score of 14.7, an audio- to-text BLEU score of 35.2, and a cross-modal classification accuracy of 92.3%. These outcomes surpass existing models by 8-15% across comparable metrics, highlighting the GAN’s effectiveness in handling data heterogeneity. The findings suggest potential applications in areas such as virtual assistants, multimedia analytics, and cross-modal content generation.

Authors
R. Arun Kumar1, C. Lisa2, V.R. Rashmi3, K. Sandhya4
University of South Wales, United Kingdom1, Nehru College of Engineering and Research Centre, India2,3, Malabar College of Engineering and Technology, India4

Keywords
Multimodal AI, Generative Adversarial Networks, Cross-Modal Synthesis, Text-Image-Audio Fusion, Model Performance Enhancement
Yearly Full Views
JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember
4111000000000
Published By :
ICTACT
Published In :
ICTACT Journal on Soft Computing
( Volume: 15 , Issue: 3 , Pages: 3567 - 3577 )
Date of Publication :
January 2025
Page Views :
127
Full Text Views :
16

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.