Abstract
Understanding public sentiment in crowded spaces has become
essential for urban management, security monitoring, and event
analysis. Traditional approaches often relied on surveys or manual
observation, which are time-consuming and limited in scalability.
Recent advancements in computer vision and artificial intelligence
offered the potential for automated, real-time sentiment analysis.
Monitoring emotions and behaviors in densely populated areas poses
challenges such as occlusion, dynamic movement, and varying
environmental conditions. Existing models often fail to achieve
accurate detection in complex scenarios, limiting practical applications
in safety, crowd management, and social analysis. This study employed
a hybrid approach combining Generative AI techniques with the YOLO
(You Only Look Once) object detection framework. YOLO was used to
detect and track individual faces and body postures within the crowd.
Generative AI was applied to enhance low-quality or partially occluded
images and generate realistic feature representations for better emotion
classification. Facial expressions, gestures, and body language were
analyzed using a pre-trained sentiment recognition model. Data
augmentation and feature normalization were applied to improve
robustness and generalization. The proposed framework demonstrated
significant improvements in detection and sentiment classification
under dense and dynamic crowd conditions. Across multiple
experiments, the system achieved an accuracy of 91.0%, precision of
89.1%, recall of 88.6%, F1-score of 89.0%, and MSE of 0.023,
outperforming conventional Faster R-CNN, SSD-GAN, and Attention
CNN-LSTM models by 6–12%. YOLO efficiently detected individual
subjects, while generative enhancement minimized misclassification
caused by occlusion and low-resolution inputs.
Authors
B. Yuvaraj, T. Ganesan, D.C. Jullie Josephine, S. Thumilvannan
Kings Engineering College, India
Keywords
Generative AI, YOLO, Sentiment Analysis, Crowd Monitoring, Real- Time Emotion Detection