GENERIC APPROACH OF MEASURING TEXT SEMANTIC SIMILARITY

ICTACT Journal on Soft Computing ( Volume: 12 , Issue: 1 )

Abstract

Text Semantic Similarity can be viewed as one of the challenging tasks as evident from current profound interest in NLP research community that has created achievable milestones through active participation in SemEval task series of the recent decade. Amidst these developments, it was realized that exploring text to compare its semantics largely depends on valid grammatical structures of sentences and sentence formulation types. In this paper, the computation of text semantic similarity is addressed by devising a novel set of generic similarity metrics based on both, word-sense of the phrases constituting the text as well as the grammatical layout and sequencing of these word-phrases forming text with sensible meaning. We have used the combination of word-sense and grammatical similarity metrics over benchmark sentential datasets. Having obtained highest value of Pearson’s correlation coefficient (0.89) with mean human similarity scores, when compared against equivalent scores obtained through closely competent structured approach models, plagiarism-detection classification task was revisited on well-known paragraph-phrased Rewrite corpus articulated by Clough and Stevenson (2011) using our model to provide generic utility perspective to these novel devised similarity metrics. Here also, nearly competent classification model performance (with accuracy 76.8%) encouraged authors to work in directions that are more promising where the performance can be enhanced by improving upon dependency (grammatical relations) component in order to raise the count of true-positives and false-negatives.

Authors

Richa Dhagat, Arpana Rawal, Sunita Soni
Bhilai Institute of Technology, India

Keywords

Structural features, Word-sense similarity, Grammatical similarity, Generic similarity metrics, Wikipedia Rewrite Corpus

Published By
ICTACT
Published In
ICTACT Journal on Soft Computing
( Volume: 12 , Issue: 1 )
Date of Publication
October 2021
Pages
2494-2503

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in