ENSEMBLE CATBOOST-BASED MICROARRAY GENE EXPRESSION RETRIEVAL SYSTEM FOR ENHANCED DISEASE CLASSIFICATION

ICTACT Journal on Soft Computing ( Volume: 16 , Issue: 1 )

Abstract

Microarray gene expression profiling is a crucial tool in identifying genetic patterns associated with complex diseases. However, high dimensionality and noise in microarray datasets pose challenges for effective gene retrieval and classification. Traditional classifiers often struggle to accurately retrieve relevant gene features and achieve robust disease classification performance due to overfitting and sensitivity to noise. This paper proposes an Enhanced Gene Retrieval System leveraging an Ensemble CatBoost Algorithm. CatBoost, a gradient boosting decision tree framework, is known for handling categorical features and avoiding prediction shift. The system integrates feature selection techniques with CatBoost to optimize gene relevance and improve classification accuracy. Pre-processing includes normalization and principal component analysis (PCA) for dimensionality reduction. The ensemble approach combines multiple CatBoost models using bagging to improve robustness and generalization. The proposed method was evaluated on benchmark microarray datasets (e.g., Leukemia, Colon, Prostate). It significantly outperformed traditional models like SVM, Random Forest, KNN, and XGBoost, achieving up to 96.2% accuracy, 94.8% precision, 95.1% recall, and 0.97 F1-score. The ensemble CatBoost model demonstrated superior stability and interpretability in gene selection and disease classification.

Authors

Soumya Madduru1, Pitty Nagarjuna2
Srinivasa Ramanujan Institute of Technology, India1, Indian Institute of Science, Bengaluru, India2

Keywords

Microarray Data, CatBoost Algorithm, Gene Expression, Disease Classification, Ensemble Learning

Published By
ICTACT
Published In
ICTACT Journal on Soft Computing
( Volume: 16 , Issue: 1 )
Date of Publication
April 2025
Pages
3814 - 3819
Page Views
217
Full Text Views
15

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in