DOMAIN-SPECIFIC TOKEN RECOGNITION USING BIDIRECTIONAL ENCODER REPRESENTATIONS FROM TRANSFORMERS AND SCIBERT

ICTACT Journal on Microelectronics ( Volume: 10 , Issue: 2 )

Abstract

Make machines to read and comprehend information from natural language documents are not an easy task. Machine reading comprehension is a solution to alleviate this issue by extracting the relevant information from the corpus by posing a question based on the context. The problem associated with this knowledge retrieval is in the correct answer extraction from the context with language understanding. The traditional rule-based, keyword search and deep learning approaches are inadequate to infer the right answer from the input context. The Transformer based methodologies are used to excerpt the most accurate answer from the context document. This article utilizes one of the exceptional transformer models - BERT (Bidirectional Encoder Representations from Transformers) for empirical analysis for Neural Machine Reading Comprehension. This article aims to reveal the differences between the BERT and the domain-specific models. Furthermore, explores the need for domain specific models and how these models outperform the BERT.

Authors

Nisha Varghese1, Shafi Shereef2
Christ University, India1, Jain University, India2

Keywords

BERT, Transformers, Span Extraction, SciBERT, BioBERT

Published By
ICTACT
Published In
ICTACT Journal on Microelectronics
( Volume: 10 , Issue: 2 )
Date of Publication
July 2024
Pages
1817 - 1821

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in