AN UNSUPERVISED HEADER INDEPENDENT APPROACH TOWARDS SUBJECT COLUMN DETECTION IN TABLES

ICTACT Journal on Soft Computing ( Volume: 8 , Issue: 4 )

Abstract

vioft2nntf2t|tblJournal|Abstract_paper|0xf4ff975e270000002959010001000000
Subject columns are the important columns that help infer the correct subject matter of the table. The main challenging problem is detecting appropriate subject columns in tables with more than the same. Existing approaches restricted to identification of only one subject column in tables with more than one subject column. With this, it is not possible to infer the correct subject matter of the table. In case of subject column detection, the existing approaches requires table information such as table headers, additional evidences about the table from web pages and also training in prior with a labeled set of tables. To solve these issues, in this paper, we proposed a simple header independent semantic based Concept-Voting Subject Column Detection (CVSCD) algorithm. The proposed algorithm identifies possible subject columns in table with more than one subject column, which provides a way to infer table’s correct subject matter. Moreover, CVSCD is unsupervised and works for tables without any table information such as table caption, table headers etc. Experimental results have shown that our approach achieved better accuracy compared to the existing approaches on a corpus of tables extracted from web.

Authors

K. Karpaga Priyaa, A. Meena Kabilan, C. Saranya
Sri Sai Ram Engineering College, India

Keywords

Concept-Voting Subject Column Detection (CVSCD), Subject Column, Subject Matter, Table Headers

Published By
ICTACT
Published In
ICTACT Journal on Soft Computing
( Volume: 8 , Issue: 4 )
Date of Publication
July 2018
Pages
1714-1719

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in