ENSEMBLE CATBOOST-BASED MICROARRAY GENE EXPRESSION RETRIEVAL SYSTEM FOR ENHANCED DISEASE CLASSIFICATION
Abstract
Microarray gene expression profiling is a crucial tool in identifying genetic patterns associated with complex diseases. However, high dimensionality and noise in microarray datasets pose challenges for effective gene retrieval and classification. Traditional classifiers often struggle to accurately retrieve relevant gene features and achieve robust disease classification performance due to overfitting and sensitivity to noise. This paper proposes an Enhanced Gene Retrieval System leveraging an Ensemble CatBoost Algorithm. CatBoost, a gradient boosting decision tree framework, is known for handling categorical features and avoiding prediction shift. The system integrates feature selection techniques with CatBoost to optimize gene relevance and improve classification accuracy. Pre-processing includes normalization and principal component analysis (PCA) for dimensionality reduction. The ensemble approach combines multiple CatBoost models using bagging to improve robustness and generalization. The proposed method was evaluated on benchmark microarray datasets (e.g., Leukemia, Colon, Prostate). It significantly outperformed traditional models like SVM, Random Forest, KNN, and XGBoost, achieving up to 96.2% accuracy, 94.8% precision, 95.1% recall, and 0.97 F1-score. The ensemble CatBoost model demonstrated superior stability and interpretability in gene selection and disease classification.

Authors
Soumya Madduru
Srinivasa Ramanujan Institute of Technology, India

Keywords
Microarray Data, CatBoost Algorithm, Gene Expression, Disease Classification, Ensemble Learning
Yearly Full Views
JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember
000440000000
Published By :
ICTACT
Published In :
ICTACT Journal on Soft Computing
( Volume: 16 , Issue: 1 , Pages: 3814 - 3819 )
Date of Publication :
April 2025
Page Views :
40
Full Text Views :
8

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.