SEMANTIC BASED EXTRACTIVE DOCUMENT SUMMARIZATION USING DEEP LEARNING MODEL
Abstract
The rapid growth of web documents led to the entailment of automatic document summaries. Extractive summarization designates certain principle features from the input document and groups them together to generate a summary. This empowers readers to quickly browse the document and unveil the information in it. The focus of this work is to propose a clustering algorithm that suits for the summarization of both Tamil and English documents. Transformer mechanism that is trained on 104 languages (which includes Tamil and English language) is used to represent each sentence in the source document as features in the high dimensional space. Feature vectors are exposed to clustering with a notion of ignoring outliers and group similar features. A hybrid clustering algorithm is proposed to generate efficient clustering that aims in forming clusters that are densely coupled and massive clusters are divided as sub-clusters to facilitate sentence selection from each cluster. An identical number of sentences are picked from each cluster/sub-clusters and are included in the summary until the summary size outreaches the threshold. The performance of the proposed clustering algorithm is evaluated on both Tamil and English document. The proposed clustering algorithm is applied on the CNN/DailyMail dataset and is evaluated in terms of ROUGE metrics. In addition to this, the summary generated for the Tamil documents are shared with readers for evaluating based on the reader’s perspective. ROUGE and the Mean Opinion Score prove that the clusters generated by the proposed model are well-organized and the summary is precise and informative. The proposed summarization model outperforms existing Tamil text summarization models.

Authors
S. Divya1, N. Sripriya2
Shiv Nadar University, India1, Sri Sivasubramaniya Nadar College of Engineering, India2

Keywords
Extractive Summarization, Hybrid Clustering, Effective, Summary, Tamil Text Summarization
Yearly Full Views
JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember
200000000000
Published By :
ICTACT
Published In :
ICTACT Journal on Soft Computing
( Volume: 15 , Issue: 4 , Pages: 3669 - 3681 )
Date of Publication :
January 2025
Page Views :
9
Full Text Views :
2

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.