Approaches to the Classification Of High Entropy File Fragments
Buchanan, W. (2013). Approaches to the Classification Of High Entropy File Fragments. Digital Investigator, 10, (4), 372–384.
In this paper we propose novel approaches to the problem of classifying high entropy file fragments. Although classification of file fragments is central to the science of Digital Forensics, high entropy types have been regarded as a problem. Roussev and Garfinkel (2009) argue that existing methods will not work on high entropy fragments because they have no discernible patterns to exploit. We propose two methods that do not rely on such patterns. The NIST statistical test suite is used to detect randomness in 4 KiB fragments. These test results were analysed using an Artificial Neural Network (ANN). Optimum results were 91% and 82% correct classification rates for encrypted and compressed fragments respectively. We also use the compressibility of a fragment as a measure of its randomness. Correct classification was 76% and 70% for encrypted and compressed fragments respectively.We show that newer more efficient compression formats are more difficult to classify.We have used subsets of the publicly available ‘GovDocs1 Million File Corpus’ so that any future research may make valid comparisons with the results obtained here.
Director of CDCS
+44 131 455 2759
+44 131 455
+44 131 455 2335
Areas of Expertise
See all areas of expertise
Electronic information now plays a vital role in almost every aspect of our daily lives. So the need for a secure and trustworthy online infrastructure is more important than ever. without it, not only the growth of the internet but our personal interactions and the economy itself could be at risk.