ENHANCED ENSEMBLE CLASSIFICATION TECHNIQUES FOR ACCURATE SPAM DETECTION IN E-MAIL COMMUNICATIONS
Abstract
The exponential rise in email usage has paralleled an increase in unsolicited spam messages, posing significant threats such as phishing, malware dissemination, and personal data breaches. Detecting spam accurately is crucial to protect users and ensure efficient communication. Despite the development of various machine learning approaches, single classifiers often fail to generalize across diverse email datasets due to overfitting or lack of robustness. Ensemble learning, which combines multiple models, offers potential advantages in improving spam detection rates and reducing false positives. This study proposes a hybrid ensemble classification framework incorporating Bagging, Boosting (AdaBoost), and Voting techniques to classify email messages as spam or ham (non-spam). A preprocessed dataset is vectorized using TF-IDF, and multiple classifiers including Decision Trees, Naive Bayes, and Support Vector Machines are employed. Ensemble strategies are then used to enhance predictive performance through majority voting and weighted aggregation. The proposed ensemble model significantly outperforms standalone classifiers in terms of accuracy, precision, recall, and F1-score. Experimental evaluations on the widely-used SpamAssassin and Enron datasets demonstrate consistent improvements, with the Voting ensemble achieving up to 96.8% accuracy and lower false positive rates compared to existing methods.

Authors
T.S. Umamaheswari, M. Umaselvi
Jayagovind Harigopal Agarwal Agarsen College, India

Keywords
Spam Detection, Ensemble Learning, Email Classification, Voting Classifier, TF-IDF
Yearly Full Views
JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember
000400000000
Published By :
ICTACT
Published In :
ICTACT Journal on Soft Computing
( Volume: 16 , Issue: 1 , Pages: 3820 - 3824 )
Date of Publication :
April 2025
Page Views :
28
Full Text Views :
4

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.