Text Semantic Similarity can be viewed as one of the challenging tasks as evident from current profound interest in NLP research community that has created achievable milestones through active participation in SemEval task series of the recent decade. Amidst these developments, it was realized that exploring text to compare its semantics largely depends on valid grammatical structures of sentences and sentence formulation types. In this paper, the computation of text semantic similarity is addressed by devising a novel set of generic similarity metrics based on both, word-sense of the phrases constituting the text as well as the grammatical layout and sequencing of these word-phrases forming text with sensible meaning. We have used the combination of word-sense and grammatical similarity metrics over benchmark sentential datasets. Having obtained highest value of Pearson’s correlation coefficient (0.89) with mean human similarity scores, when compared against equivalent scores obtained through closely competent structured approach models, plagiarism-detection classification task was revisited on well-known paragraph-phrased Rewrite corpus articulated by Clough and Stevenson (2011) using our model to provide generic utility perspective to these novel devised similarity metrics. Here also, nearly competent classification model performance (with accuracy 76.8%) encouraged authors to work in directions that are more promising where the performance can be enhanced by improving upon dependency (grammatical relations) component in order to raise the count of true-positives and false-negatives.

Richa Dhagat, Arpana Rawal, Sunita Soni
Bhilai Institute of Technology, India

Structural features, Word-sense similarity, Grammatical similarity, Generic similarity metrics, Wikipedia Rewrite Corpus
Published By :
Published In :
ICTACT Journal on Soft Computing
( Volume: 12 , Issue: 1 )
Date of Publication :
October 2021

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.