Lightweight Federated Deepfake Detection with Adaptive Fusion and Temporal Transformers

Main Article Content

Stephy Joy D, R.Thirumalai Selvi

Abstract

Highly realistic Video deepfakes are major threats to information integrity, online trust, and security. Whereas more lightweight models like BNNs pre-trained on ViTs have demonstrated potential to scale to efficient real-time deepfake detection in pretrained centralized settings, they are still prone to data privacy, heterogeneity, and scalability issues in real world applications where data are distributed and data are often limited. To overcome such difficulties, we set out a federated lightweight deepfake detection structure that will expand BNN+ViT classifiers into a similar and secretive setting. This framework presents three new modules: Adaptive Feature Fusion (AFF), where the spatial, frequency and semantics features are weighted adaptively to enhance intra-frame robustness; Temporal Transformer Fusion (TTF), a module to capture time-varying irregularities by modeling time correlations; and Federated Knowledge Distillation (FedKD), a framework that uses lightweight student models deployed at devices to inherit robustness of a central teacher model without data transfer. Experiments on DFDC, FaceForensics++ and OpenForensics show that our approach reaches an accuracy of 97.5% and an AUC of 98.3 with just 2.4 GFLOPs beating state-of-the-art models, including CNN, MesoNet, EfficientNetV2-M, and ViT-B/32, whilst being efficient in deployment. With the synthesized lightweight effectiveness with federated scalability and privacy, the presented framework is feasible and broadly applicable to real-life forensic and security use cases.

Article Details

Section
Articles