A Digital Forensics methodology for outsourced cloud-based Hadoop HDFS implementations

Rodriguez Barragan, M. (2017). A Digital Forensics methodology for outsourced cloud-based Hadoop HDFS implementations (BEng (Hons) CSF Dissertation). Edinburgh Napier University (Leimich, P., Ramsay, B.).



Two paradigms are changing the IT industry; Big Data analytics and Cloud computing. These two technologies are having an extensive impact not only in the IT industry itself but in business and society in general. Big data analytics challenges to digital forensics include a larger data volume; a bigger variety of formats; and a faster generation of data; are involved in investigations. Whereas, cloud computing investigations involve dealing with: scalability and flexibility of resources, multi-tenancy, lack of physical access, and dislocation of computing nodes issues.

Based on current research, this paper develops a digital forensics methodology to be used in investigations which involve two of the major frameworks for Big Data and cloud computing: Hadoop and Amazon Web Services. Amazon Web Services as a leader in cloud computing was chosen to implement a Hadoop cluster of five servers comprising one master and four slaves. The methodology itself follows an identification of cluster properties, location of artefacts, acquisition and analysis stage approach adapted to the particularities of the platforms it is designed for. Three different forensics scenarios were then used to test the methodology previously designed: creation, modification and deletion of data. Acquisition methods suitable for cloud computing environments were employed followed by a forensics analysis of evidence collected including Ubuntu and Hadoop logs and metadata.

The results found relevant information holding forensics value; fsImage, Edit logs and a number of logs within the $Hadoop Home/logs folder were identified as forensics artefacts. At the host Operating System layer, several files were also identified as artefacts including /var/log/auth, /etc/hosts among other sources of evidence. Details including the mapping between Blocks ID and DataNode location, timestamps and user’s id were extracted from the evidence collected which were contrasted with auth logs to reconstruct the events simulated.

This investigation aims to help in the developing of a standard forensics procedure to be implemented in cases such the one object of studied, however further development is needed to address issues such overcoming the high level of trust on the CSP necessary when undertaking cloud forensics investigations and the concern about the need of automatization of analysis stage is also drawn.
[Read More]


Manuel Rodriguez Barragan
Student Intern
+44 131 455

Areas of Expertise

Electronic information now plays a vital role in almost every aspect of our daily lives. So the need for a secure and trustworthy online infrastructure is more important than ever. without it, not only the growth of the internet but our personal interactions and the economy itself could be at risk.

Associated Projects