LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS

ICTACT Journal on Soft Computing ( Volume: 8 , Issue: 2 )

Abstract

vioft2nntf2t|tblJournal|Abstract_paper|0xf4ff7aa92500000038b6020001000700
Data compressions for different world languages including Indian languages are in high need and demand. Tamil language is one of the longest-surviving classical languages in the world. Usage of Tamil language for communication and storage was increased due to the digitization of government documents and orders. Lossless text compression process for Tamil language document involves substituting an ASCII character in place of Unicode Tamil characters, since the size of an ASCII character is one byte where as a Unicode character size range between 1 byte to 4 bytes depends on the encoding file storage type. The decompression process involves the reverse of compression technique (i.e) replacing ASCII characters with Unicode characters. This paper describes about the architecture of compression and decompression process for Tamil text documents.

Authors

B Vijayalakshmi, N Sasirekha
Vidyasagar College of Arts and Science, India

Keywords

Compression, Decompression, Unicode, ASCII and Substitution

Published By
ICTACT
Published In
ICTACT Journal on Soft Computing
( Volume: 8 , Issue: 2 )
Date of Publication
January 2018
Pages
1635-1640

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in