Novel Approaches To The Classification Of High Entropy File Fragments

Penrose, P. (2013). Novel Approaches To The Classification Of High Entropy File Fragments (MSc ASDF Dissertation). Edinburgh Napier University (Macfarlane, R., Buchanan, B.).


ISBN:
ISSN:

Abstract

In this thesis we propose novel approaches to the problem of classifying high entropy file fragments. We achieve 97% correct classification for encrypted fragments and 78% for compressed. Although classification of file fragments is central to the science of Digital Forensics, high entropy types have been regarded as a problem. Roussev and Garfinkel [1] argue that existing methods will not work on high entropy fragments because they have no discernible patterns to exploit. We propose two methods that do not rely on such patterns. The NIST statistical test suite is used to detect randomness in 4KB fragments. These test results were analysed using Support Vector Machines, k-Nearest-Neighbour analysis and Artificial Neural Networks (ANN). We compare the performance of each of these analysis methods. Optimum results were obtained using an Artificial Neural Network for analysis giving 94% and 74% correct classification rates for encrypted and compressed fragments respectively. We also use the compressibility of a fragment as a measure of its randomness. Correct classification was 76% and 70% for encrypted and compressed fragments respectively. Although it gave poorer results for encrypted fragments we believe that this method has more potential for future work.
We have used subsets of the publicly available ‘GovDocs1 Million File Corpus’ so that any future research may make valid comparisons with the results obtained here.
[Read More]

Authors

Philip Penrose
Research student
P.Penrose@napier.ac.uk
+44 131 455

Areas of Expertise

Cyber-Security
Electronic information now plays a vital role in almost every aspect of our daily lives. So the need for a secure and trustworthy online infrastructure is more important than ever. without it, not only the growth of the internet but our personal interactions and the economy itself could be at risk.

Associated Projects