SCRIPT IDENTIFICATION FROM CAMERA CAPTURED INDIAN DOCUMENT IMAGES WITH CNN MODEL

ICTACT Journal on Soft Computing ( Volume: 14 , Issue: 2 )

Abstract

Compared to typical scanners, handheld cameras offer convenient, flexible, portable, and noncontact image capture, which enables many new applications and breathes new life into existing ones, but camera-captured documents may suffer from distortions caused by a nonplanar document shape and perspective projection, which lead to the failure of current optical character recognition (OCR) technologies. This paper presents a new CNN model for script identification from camera-captured Indian multilingual document images. To evaluate the performance of the proposed model 9 regional languages, one national language and one international Roman languages are considered. Two languages, Hindi national language, and Roman English language are taken as the common languages with regional language for the study. The proposed method is applied on Bi-script, Tri-script, and Multi-script combinations. The average recognition accuracy for three script combinations is 92.92%, for bi-script 91.33%, and for tri-script 87.33%. is achieved. The proposed method is the unified approach used for identifying the script from bi-script, tri-script and multi- script camera-captured document images and is the novelty of this paper. The proposed model is compared with the Alexnet pretrained CNN model, and it achieved the highest recognition accuracy.

Authors

Satishkumar Mallappa1, B.V. Dhandra2, Gururaj Mukarambi3
Sri Sathya Sai University for Human Excellence Kalaburagi Campus, India1, Garden City University, India2, Central University of Karnataka, India3

Keywords

OCR, Deep Neural Network, Alexnet, CNN, Script Identification

Published By
ICTACT
Published In
ICTACT Journal on Soft Computing
( Volume: 14 , Issue: 2 )
Date of Publication
October 2023
Pages
3232 - 3236

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in