The rapid growth of web documents led to the entailment of automatic
document summaries. Extractive summarization designates certain
principle features from the input document and groups them together
to generate a summary. This empowers readers to quickly browse the
document and unveil the information in it. The focus of this work is to
propose a clustering algorithm that suits for the summarization of both
Tamil and English documents. Transformer mechanism that is trained
on 104 languages (which includes Tamil and English language) is used
to represent each sentence in the source document as features in the
high dimensional space. Feature vectors are exposed to clustering with
a notion of ignoring outliers and group similar features. A hybrid
clustering algorithm is proposed to generate efficient clustering that
aims in forming clusters that are densely coupled and massive clusters
are divided as sub-clusters to facilitate sentence selection from each
cluster. An identical number of sentences are picked from each
cluster/sub-clusters and are included in the summary until the
summary size outreaches the threshold. The performance of the
proposed clustering algorithm is evaluated on both Tamil and English
document. The proposed clustering algorithm is applied on the
CNN/DailyMail dataset and is evaluated in terms of ROUGE metrics.
In addition to this, the summary generated for the Tamil documents are
shared with readers for evaluating based on the reader’s perspective.
ROUGE and the Mean Opinion Score prove that the clusters generated
by the proposed model are well-organized and the summary is precise
and informative. The proposed summarization model outperforms
existing Tamil text summarization models.
S. Divya1, N. Sripriya2 Shiv Nadar University, India1, Sri Sivasubramaniya Nadar College of Engineering, India2
Extractive Summarization, Hybrid Clustering, Effective, Summary, Tamil Text Summarization
January | February | March | April | May | June | July | August | September | October | November | December |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Published By : ICTACT
Published In :
ICTACT Journal on Soft Computing ( Volume: 15 , Issue: 4 , Pages: 3669 - 3681 )
Date of Publication :
January 2025
Page Views :
9
Full Text Views :
2
|