A Study to Perform Pdf Malware Detection through Document Analysis and Logistic Model Tree Techniques

Main Article Content

Kiran Pachlasiya, Harsh Mathur

Abstract

PDF malware has become a serious issue in cybersecurity, using the ubiquitous usage of Portable Document Format (PDF) files to undertake malicious operations such as data theft, illegal access, and system penetration. Effective identification of such viruses needs novel algorithms that examine both the structural and content-based aspects of PDF documents. This research introduces a novel method for improving PDF virus detection via the use of structural analysis and a Logistic Model Tree. We compare several popular machine learning models using data collected by the Indian Institute for Cybersecurity. An successful method for PDF virus diagnosis makes use of Logistic Model Tree, a comprehensive feature selection strategy, and greater attention to PDF file features. Its usefulness in handling the ever-changing threat environment is shown by the findings, which show that the Logistic Model Tree is superior than benchmark models with an increased accuracy of 97.51%. Comparative investigation demonstrates that LMT routinely outperforms other models, displaying powerful malware detection capabilities while decreasing false alarms. These findings offer LMT as a resilient and unique solution for PDF virus detection.

Article Details

Section
Articles