Investigation Into A Digital Forensics Triage Tool Using Sampling, Hashes And Bloom Filters - SHAFT
Clayton, J. (2012). Investigation Into A Digital Forensics Triage Tool Using Sampling, Hashes And Bloom Filters - SHAFT (MSc ASDF Dissertation). Edinburgh Napier University (Macfarlane, R.,
There is a problem in the world of digital forensics. The demands on digital forensic investigators and resources will continue to increase as the use of computers and other electronic devices increases, and as the storage capacity of these devices increases. The digital forensic process requires that evidence be identified and examined, and resources to do this are constrained. This is creating a backlog of work as seized media and devices wait to be analysed, and some investigations or checks in the field may be reduced or discarded as impractical. There is a technique which can be used to help quickly to collect and examine data to see if it is of interest. This technique combines statistical sampling and hashes. This thesis describes the design, implementation and evaluation of a prototype digital forensics investigation triage tool to use sampling, disk sector hashing and Bloom filters.
The design of the tool and of an experiment environment to assess the tool is described. This tool can use a database or Bloom filter to match the hashes from disk sectors against the stored hashes for a file which is being searched for. The design also covers a program to prepare the database and Bloom filter which will be used for the hash matching. The tools were successfully implemented in two separate but functionally identical experiment environments. Development and initial experiment was done on a local home PC.
Further experiment was carried out on a remote virtual PC for better performance. The tools were successfully implemented and executed in both environments. In the evaluation, the program generated both database and Bloom filter for the file. Using the tools the file was located when searching the source disk image file, and when sampling the detection rate increased with increased sampling. The Bloom filter false positive rate was as predicted by theory (Roussev, Chen, Bourg, & Richard, 2006) which confirmed that the Bloom filter had been correctly implemented. Base lining was successful in establishing the experiment environments and the effect of environmental factors on performance. Repeated experiments by program loops and scheduling the programs to run repeatedly using crontab were successful in producing a body of experiment results which were written to file for analysis. These experiments succeeded in analysing the effect of varying Bloom filter parameters and of file fragmentation,
Metrics for experiment evaluation included overall program time, match time, and disk read and hashing time. Metrics were recorded for Hit rate which is a measure of the proportion of experiments with a set of parameters which found any match with the file. Hit rate is measured over a series of experiments and is a probability with a maximum value of one.
Metrics were also recorded for Accuracy which measures the ratio of true positive matches to all positive matches. Accuracy is a ratio with a maximum value of one which would mean all matches were true positives. Any value for Accuracy less than one means that there were some false positives. From the evaluation it was established that the choice of Bloom filter bit length was significant in the number of false-positives and that this also had a noticeable effect on the match time. The choice of the number of Bloom filter hashing algorithms had little effect on performance. File fragmentation had no significant effect on performance.
The dissertation confirms the successful proof of concept of working tool for digital forensics investigation triage using sampling, hashing and Bloom filter which works with a false positive rate which can be predicted. This tool was written in Python which proved a simple to use programming language. This prototype tool can provide the basis for further work on a practical tool for use in real world digital forensics investigation. The dissertation concludes with ideas for possible future work, and a personal reflection.
Areas of Expertise
See all areas of expertise
Electronic information now plays a vital role in almost every aspect of our daily lives. So the need for a secure and trustworthy online infrastructure is more important than ever. without it, not only the growth of the internet but our personal interactions and the economy itself could be at risk.