Abstract
Multi-modal medical imaging provides complementary anatomical and
functional information essential for accurate diagnosis and treatment
planning. However, conventional fusion techniques often fail to retain
fine-grained structural and functional details, leading to suboptimal
diagnostic quality. Integrating diverse modalities such as MRI, CT, and
PET requires an approach capable of capturing complex inter-modal
relationships while preserving both spatial structures and functional
intensities. Existing convolution-based methods are limited in
modeling long-range dependencies, resulting in loss of critical clinical
information. This study proposed a transformer-based cross-attention
network for multi-modal medical image fusion. Initially, input images
underwent preprocessing including normalization, resizing, and noise
reduction. Feature representations were extracted via parallel encoder
streams for each modality. A cross-attention mechanism then enabled
the network to learn modality-specific relationships, highlighting
complementary regions while suppressing redundant information.
Finally, a fusion module combined the attended features into a single
output, followed by reconstruction to generate a high-fidelity fused
image. Performance was evaluated using standard metrics including
structural similarity index (SSIM), peak signal-to-noise ratio (PSNR),
and edge preservation (EP) across multiple datasets. The proposed
method consistently outperformed conventional and deep learning-
based fusion approaches. Quantitative evaluation showed
improvements in structural similarity index (SSIM: 0.94), peak signal-
to-noise ratio (PSNR: 39.1 dB), edge preservation (EP: 0.89), mutual
information (MI: 1.38), and standard deviation (SD: 47.1) on the ADNI
dataset. Qualitative analysis demonstrated enhanced visualization of
anatomical and functional features, supporting its potential clinical
applicability.
Authors
M. Senthil Vadivu1, R. Deepa2
Sona College of Technology, India1, R.M.K. Engineering College, India2
Keywords
Multi-Modal Fusion, Cross-Attention Transformer, Medical Imaging, MRI, CT, PET