Classifying images efficiently using various algorithms is very useful now-a-days given that the field of computer vision is growing rapidly. The research work highlighted in this paper focuses on the independent use of various models to classify images and then combining them together to form a better model in terms of performance than each of the individual models. The dataset used consists of 200 classes with 90,000 training images, 10,000 validation images and 10,000 test images. The data preparation step in this work involves resizing the images (data), shuffling them and transforming them into a data generator to provide input to the models. The images were also augmented using two different sets of image transformation effects to get more data for the models to train on. These data were then used to train five different models (one model trained from scratch and four other models using pre-trained weights and transfer learning) independently. The performance of each model was judged by checking two evaluation metrics –f1-score and categorical accuracy. The models were also tried to be fine-tuned to get a better performance, and finally the models were ensembled together to get a better categorical accuracy and f1-score on unseen (validation and test) data.

Debabrata Datta1, Anweshan Mukherjee2, Soumen Mukherjee3, Arup Kr. Bhattacharjee4, Anal Acharya5
St. Xavier's College, India1,2,5, RCC Institute of Information Technology, India3,4,

Image Classification, Convolutional Neural Networks, Image Augmentation, Model Ensembling, F1-Score
Published By :
Published In :
ICTACT Journal on Image and Video Processing
( Volume: 12 , Issue: 4 , Pages: 2679-2692 )
Date of Publication :
May 2022

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.