The exponential rise in email usage has paralleled an increase in
unsolicited spam messages, posing significant threats such as phishing,
malware dissemination, and personal data breaches. Detecting spam
accurately is crucial to protect users and ensure efficient
communication. Despite the development of various machine learning
approaches, single classifiers often fail to generalize across diverse
email datasets due to overfitting or lack of robustness. Ensemble
learning, which combines multiple models, offers potential advantages
in improving spam detection rates and reducing false positives. This
study proposes a hybrid ensemble classification framework
incorporating Bagging, Boosting (AdaBoost), and Voting techniques to
classify email messages as spam or ham (non-spam). A preprocessed
dataset is vectorized using TF-IDF, and multiple classifiers including
Decision Trees, Naive Bayes, and Support Vector Machines are
employed. Ensemble strategies are then used to enhance predictive
performance through majority voting and weighted aggregation. The
proposed ensemble model significantly outperforms standalone
classifiers in terms of accuracy, precision, recall, and F1-score.
Experimental evaluations on the widely-used SpamAssassin and Enron
datasets demonstrate consistent improvements, with the Voting
ensemble achieving up to 96.8% accuracy and lower false positive rates
compared to existing methods.
T.S. Umamaheswari, M. Umaselvi Jayagovind Harigopal Agarwal Agarsen College, India
Spam Detection, Ensemble Learning, Email Classification, Voting Classifier, TF-IDF
January | February | March | April | May | June | July | August | September | October | November | December |
0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Published By : ICTACT
Published In :
ICTACT Journal on Soft Computing ( Volume: 16 , Issue: 1 , Pages: 3820 - 3824 )
Date of Publication :
April 2025
Page Views :
28
Full Text Views :
4
|