Detecting and Preventing Sql Injection Attacks with Machine Learning Predictive Analytics
Main Article Content
Abstract
The storage of the enormous volume of large data that is exchanged over the Internet from cloud-hosted web apps to Internet of Things (IoT) smart devices depends heavily on the back-end database. On weak web applications, the Structured Query Language (SQL) Injection Attack (SQLIA) is still the go-to attack for hackers looking to steal private information from databases with potentially harmful outcomes. The current solutions, which mostly use signature techniques, were developed prior to the latest big data mining issues and, as a result, lack the functionality and capacity to handle new signatures that are hidden in web requests. Predictive analytics using alternative machine learning (ML) offers a scalable and useful way to mine massive data for SQLIA detection and mitigation. Unfortunately, a well-known problem in SQLIA research is the absence of readily available robust corpuses or data sets having patterns and historical data items to train a classifier. In this project, we investigate the creation of a data set that includes extraction from well-known attack patterns, such as SQL tokens and injection point symbols. In order to demonstrate vast amounts of learning data, we also construct a web application as a test case that anticipates dictionary word lists as vector variables. For supervised learning, the data set has been pre-processed, labeled, and feature hashed. In order to prevent malicious web requests from reaching the protected back-end database, the trained classifier will be deployed as a web service that is used in a custom dot NET application that implements a web proxy Application Programming Interface (API). This will allow the classifier to intercept and accurately predict SQLIA in web requests. With empirical assessments shown in the Confusion Matrix (CM) and Receiver Operating Curve (ROC), this project shows a complete proof of concept implementation of an ML predictive analytics and deployment of the resulting web service that accurately predicts and prevents SQLIA
