IMPROVED FEATURE SET EXTRACTION FROM DOCUMENTS USING MODIFIED BAG OF WORDS

Abstract
In conventional literatures, there are several different methods of collection and extraction and are also used to minimize dimensionality. Traditional methods are intuitively designed to delete redundant and outdated information to help define new test cases more effectively. But the number of specific words in the Bag of Words (BoW) model must be manually calculated, requiring time and work and portability of deficiencies. In addition, the number of codebook vectors in BoW rises as cancer types grow and the efficiency and accuracy of detection are reduced. The BoW model is therefore not ideal for multi-operative failure diagnosis. Therefore, we propose an improved BoW in this paper which selects the number of special terms required to collect cancer diagnostic functions from different documents. The overall recognition and accuracy rates are higher than other existing extraction models. The improved BoW method has been verified to be highly effective in operating conditions that meet the requirements in real time.

Authors
R Sathish Babu, R Nagarajan
Annamalai University, India

Keywords
Bag of Words, Cancer Document Retrieval, Codebook, Dimensionality Reduction
Published By :
ICTACT
Published In :
ICTACT Journal on Soft Computing
( Volume: 11 , Issue: 1 )
Date of Publication :
October 2020
DOI :

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.