MULTI-MODAL MEDICAL IMAGE FUSION LEVERAGING TRANSFORMER-BASED CROSS-ATTENTION NETWORKS IN CLINICAL APPLICATIONS

ICTACT Journal on Image and Video Processing ( Volume: 16 , Issue: 2 )

Abstract

Multi-modal medical imaging provides complementary anatomical and functional information essential for accurate diagnosis and treatment planning. However, conventional fusion techniques often fail to retain fine-grained structural and functional details, leading to suboptimal diagnostic quality. Integrating diverse modalities such as MRI, CT, and PET requires an approach capable of capturing complex inter-modal relationships while preserving both spatial structures and functional intensities. Existing convolution-based methods are limited in modeling long-range dependencies, resulting in loss of critical clinical information. This study proposed a transformer-based cross-attention network for multi-modal medical image fusion. Initially, input images underwent preprocessing including normalization, resizing, and noise reduction. Feature representations were extracted via parallel encoder streams for each modality. A cross-attention mechanism then enabled the network to learn modality-specific relationships, highlighting complementary regions while suppressing redundant information. Finally, a fusion module combined the attended features into a single output, followed by reconstruction to generate a high-fidelity fused image. Performance was evaluated using standard metrics including structural similarity index (SSIM), peak signal-to-noise ratio (PSNR), and edge preservation (EP) across multiple datasets. The proposed method consistently outperformed conventional and deep learning- based fusion approaches. Quantitative evaluation showed improvements in structural similarity index (SSIM: 0.94), peak signal- to-noise ratio (PSNR: 39.1 dB), edge preservation (EP: 0.89), mutual information (MI: 1.38), and standard deviation (SD: 47.1) on the ADNI dataset. Qualitative analysis demonstrated enhanced visualization of anatomical and functional features, supporting its potential clinical applicability.

Authors

M. Senthil Vadivu1, R. Deepa2
Sona College of Technology, India1, R.M.K. Engineering College, India2

Keywords

Multi-Modal Fusion, Cross-Attention Transformer, Medical Imaging, MRI, CT, PET

Published By
ICTACT
Published In
ICTACT Journal on Image and Video Processing
( Volume: 16 , Issue: 2 )
Date of Publication
November 2025
Pages
3758 - 3764
Page Views
22
Full Text Views

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in