A Deep Learning–Based Contextual Framework for Inappropriate Content Detection and Classification of YouTube Videos Using Bi-LSTM
Main Article Content
Abstract
The rapid growth of user-generated video platforms has intensified the challenge of detecting inappropriate content that evolves dynamically across linguistic, acoustic, and visual dimensions. Conventional deep learning approaches for YouTube content moderation primarily rely on static transcript analysis or naïve multimodal fusion, often failing to capture semantic ambiguity, temporal drift, and epistemic uncertainty inherent in real-world videos. This work presents a novel, end-to-end contextual deep learning framework for inappropriate content detection and classification using a Bi-LSTM backbone augmented with five analytically distinct yet sequentially integrated modules. The proposed pipeline introduces Contextual Entropy–Weighted Transcript Refinement, Semantic Drift–Aware Bi-LSTM Encoding, Cross-Modal Temporal Coherence Fusion, Epistemic Uncertainty–Calibrated Classification, and Decision Stability & Harm Propagation Analysis. Each module is designed to optimize a specific limitation observed in existing moderation systems, while ensuring strict data flow continuity between components. Extensive experimental evaluation on large-scale YouTube datasets demonstrates measurable improvements in classification accuracy, early detection latency, calibration reliability, and decision stability. The results confirm that modeling contextual uncertainty and temporal coherence significantly enhances moderation robustness. This work contributes a novel analytical architecture suitable for deployment in large-scale video governance systems and establishes new validation dimensions for inappropriate content detection research sets.
