Abstract
In this study, two machine learning models, Long Short Term Memory (LSTM) and BERT are used to predict intensifiers in Malayalam sentences. Both models were trained to detect intensifiers using part-of-speech (POS) tags, and BERT regularly outperformed more straightforward models like Naive Bayes (NB) and Support Vector Machines (SVM) in terms of metrics like accuracy, precision, recall, and F1 score. In contrast to LSTM, which was effective but suffered from overfitting as demonstrated by the comparison of training and validation losses, BERT’s self-attention mechanism allows it to grasp intricate associations between words. LIME and SHAP visualisations further clarified the role that individual words played in sentiment classification. The results demonstrate BERT’s better performance in handling the complex intensifier prediction problem. With an emphasis on its attention process as examined by BERTology, this study demonstrates BERT’s proficiency in predicting intensifiers in Malayalam sentences. Compared to models like LSTM, BERT is far better at capturing intricate interactions between words, such intensifiers and their surrounding context, thanks to its multi-layered design and self-attention mechanism. With early layers focussing on local linkages and subsequent layers collecting broader, more global dependencies, the attention heads in BERT enable the model to concentrate on certain tokens inside the phrase. Because of its capacity to focus on various phrase components, BERT is able to comprehend the nuanced relationships between intensifiers and adjectives, which results in extremely accurate predictions at the sentence and token levels.We can observe how BERT gradually improves its comprehension of the input by visualising the attention weights across layers. This allows it to create rich contextual representations, which are essential for tasks such as sentiment analysis. This knowledge of BERT’s attention mechanism explains why it performs better than other models in recognising intensifiers and determining sentiment intensity.
Authors
R. Anitha1, K.S. Anil Kumar2, R.R. Rajeev3
University of Kerala, India1,2, International Centre for Free and Open Source Software, Thiruvananthapuram, India 3
Keywords
LSTM, BERT, POS tagging, LIME, SHap, BERTology