PEARSON CORRELATION COEFFICIENT K-NEAREST NEIGHBOR OUTLIER CLASSIFICATION ON REAL-TIME DATASETS

Abstract
Detection and classification of data that do not meet the expected behavior (outliers) plays the major role in wide variety of applications such as military surveillance, intrusion detection in cyber security, fraud detection in online transactions. Nowadays, an accurate detection of outliers with high dimension is the major issue. The trade-off between the high-accuracy and low computational time is the major requirement in outlier prediction and classification. The presence of large size diverse features need the reduction mechanism prior to classification approach. To achieve this, the Distance-based Outlier Classification (DOC) is proposed in this paper. The proposed work utilizes the Pearson Correlation Coefficient (PCC) to measure the correlation between the data instances. The minimum instance learning through PCC estimation reduces the dimensionality. The proposed work is split up into two phases namely training and testing. During the training process, the labeling of most frequent samples isolates them from the infrequent reduce the data size effectively. The testing phase employs the k-Nearest Neighborhood (k-NN) scheme to classify the frequent samples effectively. The dimensionality and the k-value are inversely proportional to each other. In proposed work, the selection of large value of k offers the significant reduction in dimensionality. The combination of PCC-based instance learning and the high value of k reduces the dimensionality and noise respectively. The comparative analysis between the proposed PCC-k-NN with the conventional algorithms such as Decision Tree, Naïve Bayes, Instance-Based K-means (IBK), Triangular Boundary-based Classification (TBC) regarding sensitivity, specificity, accuracy, precision, and recall proves its effectiveness in OC. Besides, the experimental validation of proposed PCC-k-NN with the state-of art methods regarding the execution time assures trade-off between the low-time consumption and high-accuracy.

Authors
D Rajakumari
Nandha Arts and Science College, India

Keywords
Data Mining, Distance-based Instance Learning, Outlier Detection, Outlier Classification, Pearson Correlation Coefficient, k-Nearest Neighbor
Published By :
ICTACT
Published In :
ICTACT Journal on Soft Computing
( Volume: 10 , Issue: 2 )
Date of Publication :
January 2020

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.