Sparse Spatiotemporal Feature Learning for Video-Based Hand Gesture Recognition

A. D. Harale, K. J. Karande

doi:10.52783/pst.3256

PDF

Published: Dec 31, 2025

A. D. Harale, K. J. Karande

Abstract

Sign language and hand gesture recognition play a crucial role in enabling natural and intuitive communication between humans and machines, especially for assisting individuals with hearing and speech impairments. This paper presents a novel Sparse Motion Sequence Extraction Network (SMSE-Net) for efficient and accurate gesture recognition from video sequences. The proposed framework integrates a sparse image-wise feature extraction layer to identify salient motion information and a hybrid sequence-wise modeling layer to capture temporal dependencies across consecutive frames. By selectively focusing on informative motion patterns and suppressing redundant data, SMSE-Net significantly improves recognition performance while reducing computational overhead. Extensive experimental evaluations demonstrate that the proposed approach outperforms existing methods such as CNN, RCNN, YOLO-v3, and ResNet across multiple performance metrics, including accuracy, precision, recall, and F1-score. The results confirm the robustness, efficiency, and real-time applicability of the proposed SMSE-Net framework.

DOI : https://doi.org/10.52783/pst.3256

Issue

Vol. 49 No. 4 (2025)

Section

Articles

Acceptance Rate:	24%
Review Speed:	29 days
Issue Per Year:	4
Number of Articles:	1
Number of Reviewers:	489
Number of Contributors:	8296
Contributing Countries:	42
No. of Scopus Citations:	64269
No. of WoS Citations:	3269
Abstract Views:	82,897
PDF Download:	94,708

Article Sidebar

Main Article Content

Abstract

Article Details