SEMANTIC BASED EXTRACTIVE DOCUMENT SUMMARIZATION USING DEEP LEARNING MODEL

ICTACT Journal on Soft Computing ( Volume: 15 , Issue: 4 )

Abstract

The rapid growth of web documents led to the entailment of automatic document summaries. Extractive summarization designates certain principle features from the input document and groups them together to generate a summary. This empowers readers to quickly browse the document and unveil the information in it. The focus of this work is to propose a clustering algorithm that suits for the summarization of both Tamil and English documents. Transformer mechanism that is trained on 104 languages (which includes Tamil and English language) is used to represent each sentence in the source document as features in the high dimensional space. Feature vectors are exposed to clustering with a notion of ignoring outliers and group similar features. A hybrid clustering algorithm is proposed to generate efficient clustering that aims in forming clusters that are densely coupled and massive clusters are divided as sub-clusters to facilitate sentence selection from each cluster. An identical number of sentences are picked from each cluster/sub-clusters and are included in the summary until the summary size outreaches the threshold. The performance of the proposed clustering algorithm is evaluated on both Tamil and English document. The proposed clustering algorithm is applied on the CNN/DailyMail dataset and is evaluated in terms of ROUGE metrics. In addition to this, the summary generated for the Tamil documents are shared with readers for evaluating based on the reader’s perspective. ROUGE and the Mean Opinion Score prove that the clusters generated by the proposed model are well-organized and the summary is precise and informative. The proposed summarization model outperforms existing Tamil text summarization models.

Authors

S. Divya1, N. Sripriya2
Shiv Nadar University, India1, Sri Sivasubramaniya Nadar College of Engineering, India2

Keywords

Extractive Summarization, Hybrid Clustering, Effective, Summary, Tamil Text Summarization

Published By
ICTACT
Published In
ICTACT Journal on Soft Computing
( Volume: 15 , Issue: 4 )
Date of Publication
January 2025
Pages
3669 - 3681

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in