Hybrid LSTM-Transformer Architecture for Abnormal Crowd Behavior Detection

Main Article Content

Komal Jadhav, Mahesh Chavan

Abstract

This study proposes an innovative deep learning framework for abnormal crowd behavior detection, combining powerful spatial-temporal feature extraction techniques with optimized neural network components. The pipeline begins with video input, from which frames are extracted and processed using a hybrid CNN architecture that integrates Bottleneck and Residual blocks to capture deep spatial features. These features are then passed through a BiLSTM network to learn temporal dependencies, followed by a Transformer Encoder that enhances long-range context understanding. Dense layers and a Softmax function complete the classification of behaviors as normal or abnormal. The proposed model is evaluated against state-of-the-art approaches, demonstrating superior accuracy, lower false detection rates, and significantly reduced detection time, especially under high-density crowd conditions. Notably, it achieves a detection time of just 0.4 seconds at 10 fps. The model’s design supports scalability and real-time applicability, making it suitable for public safety, event monitoring, and crowd management systems. This research highlights the importance of combining spatial, temporal, and contextual insights for robust surveillance systems and offers a promising foundation for further development in intelligent video analysis and behavior detection technologies.


DOI : https://doi.org/10.52783/pst.1731

Article Details

Section
Articles