The evolution of the threat landscape over the past few years, including the important riseofmalware,aswellasthegrowthofmulti-vectorattacksandmorecoordinatedactors, has rendered traditional methods of defence such as Intrusion Detection Systems (IDS) and Firewalls less effective. Current research trends attempting to address the problem such as the use of Artiﬁcial Intelligence (AI), particularly Machine Learning (ML) applied to intrusion detection,hasshownsomegoodresults. However,theuseofhistoricaldatasets,possible lack of exploration about the importance of the features available in these datasets, as well as other issues, have increased the number of challenges in the ﬁeld.
The objective of this project is to investigate whether some of the more improved and newer datasets, such as the NSL-KDD and Kyoto 2006+ datasets, may be better candidates than the overused and older DARPA KDD Cup 1999, and to what extent the features available in a dataset are relevant to detecting particular types of attacks. Towards this, three set of experiments were designed and carried out: comparing the Cup ’99 and NSL-KDD datasets, evaluating the Kyoto 2006+ dataset, and evaluating a selection of features for detecting particular types of attacks, against three of some of the most widely used ML algorithms for intrusion detection in Weka.
On the comparison of the Cup ’99 and NSL-KDD datasets, it was concluded that while NSL-KDD produced better results when used with a clustering algorithm, the results successfully demonstrated that it did not seem like a better candidate overall, due to similar accuracy but less attack classes available. On the Kyoto 2006+ dataset, the results produced showed that this dataset was far behind the more popular datasets present in the literature, and its weaknesses highlighted as to provide some contributions to the ﬁeld in suggesting some of the factors to take into consideration when developingnewdatasets. Finally,ontheimportanceofthefeaturesavailableinadataset, itwasconcludedthatfeaturesdoplayanimportantroleindetectingcertaintypesofattacks, hence the importance ofdeveloping datasets containinga high number ofattack classesavailableformoredetailedevaluationofMLalgorithmsforintrusiondetection, or even Intrusion Detection Systems as a whole.