Evaluating Web App Datasets towards Detection of SQL Injection Attacks with Machine Learning Techniques

Aaby, P. (2016). Evaluating Web App Datasets towards Detection of SQL Injection Attacks with Machine Learning Techniques (BEng (Hons) CSF Dissertation). Edinburgh Napier University (Macfarlane, R.).



With web applications taking up an increasing amount of online traffic, the challenges of protecting web sites against attacks are similarly increasing. The Open Web Application Security Project (OWASP, 2013) list the top ten attacks, where Structured Query Language (SQL) injection attacks rank amongst one of the most common vulnerabilities. This risk is produced through insecure programming, such as not validating or sanitising the user input on web sites, e.g. during authentication. This opens up for additional SQL syntax to be injected, and unintended access potentially granted to intruders as seen in incidents such as Talk Talk (Price, 2015). Previously, detection and protection against SQL injection attacks are observed to be fragmented into various methods, such as static knowledge-based signatures, statistical and combined static/dynamic detection such as AMNESIA (Halfond & Orso, 2005). New anomaly-based machine learning detection methods are though becoming increasingly popular in the IDS field. Nguyen & Franke, (2012) are one of the key contributors to the field of applying ML to web application attacks. Their research includes a selection of four ML algorithms, against 8 types of web application attacks. However, it is difficult to confirm which algorithm performs best, in regards to a particular type of attack such as SQL injections. The hypothesis raised, therefore concerns which algorithm, if any, would favour SQL injection attacks. To understand this, the ECML/PKDD 2007 dataset used by Nguyen & Franke, (2012) were filtered down, to include only the benign and SQL injection attacks. For comparability, similar features were extracted and the new dataset used to train anomaly-based models using ML techniques. In a direct comparison between Nguyen & Franke's (2012) results, true positives produced by Decision Stump were observed to have incredible high accuracy. However, as seen in the literature review, accuracies are measured using four metrics, true positives, false positives, true negatives and false negatives. By thoroughly analysing the confusion matrix produced using Decision Stump, against similar features, resulted in a 25% false negatives. In contrast to this, RBFNetwork were found to have high accuracies throughout the four accuracy metrics, as such in this experiment, RBFNetwork is considered the best performing algorithm against SQL injection attacks. However, more research is desired around the selection of features, as expert features were found important in regards to high accuracies. As a product of this project, a sub dataset is produced together with 8 initial ML results. Future experiments against the sub dataset is welcome as well as improvements mentioned in the last chapter
[Read More]


Peter Aaby
Research student
+44 131 455

Areas of Expertise

Electronic information now plays a vital role in almost every aspect of our daily lives. So the need for a secure and trustworthy online infrastructure is more important than ever. without it, not only the growth of the internet but our personal interactions and the economy itself could be at risk.

Associated Projects