Approaches to the Classification Of High Entropy File Fragments

Penrose, P., Macfarlane, R., Buchanan, W. (2013). Approaches to the Classification Of High Entropy File Fragments. Digital Investigator, 10, (4), 372–384.

ISSN: 1742-2876


In this paper we propose novel approaches to the problem of classifying high entropy file fragments. Although classification of file fragments is central to the science of Digital Forensics, high entropy types have been regarded as a problem. Roussev and Garfinkel (2009) argue that existing methods will not work on high entropy fragments because they have no discernible patterns to exploit. We propose two methods that do not rely on such patterns. The NIST statistical test suite is used to detect randomness in 4 KiB fragments. These test results were analysed using an Artificial Neural Network (ANN). Optimum results were 91% and 82% correct classification rates for encrypted and compressed fragments respectively. We also use the compressibility of a fragment as a measure of its randomness. Correct classification was 76% and 70% for encrypted and compressed fragments respectively.We show that newer more efficient compression formats are more difficult to classify.We have used subsets of the publicly available ‘GovDocs1 Million File Corpus’ so that any future research may make valid comparisons with the results obtained here.
[Read More]


William Buchanan
Director of CDCS
+44 131 455 2759
Philip Penrose
Research student
+44 131 455
Richard Macfarlane
+44 131 455 2335

Areas of Expertise

Electronic information now plays a vital role in almost every aspect of our daily lives. So the need for a secure and trustworthy online infrastructure is more important than ever. without it, not only the growth of the internet but our personal interactions and the economy itself could be at risk.

Associated Projects