Today, data plays an important role in people’s daily activities. With the help of some database applications such as decision support systems and customer relationship management systems (CRM), useful information or knowledge could be derived from large quantities of data. However, investigations show that many such applications fail to work successfully. One of the reasons is due to the quality of data that these applications have used. There is a growing awareness that high quality of data is a key to today's business success and dirty data that exits within data source is one of the reasons that cause poor data quality. To improve the quality of data, data cleaning is crucial among the tasks of readying data and a set of methods and tools has been developed to clean dirty data. However, how to improve the efficiency while performing data cleaning and how to improve the degree of automation when performing data cleaning are still two challenges. The proposed project aims to develop a framework that could be used to clean the dirty data in order to improve the data quality. High efficiency and high degree of automation are the two goals we aim to achieve when cleaning the dirty data.
Related publications
Peng, T.,
Li, L.,
Kennedy, J. (2012). A Comparison of Techniques for Name Matching. GSTF International Journal on Computing , 2, (1), 55 - 61.
Li, L.,
Peng, T.,
Kennedy, J. (2011). A Rule Based Taxonomy of Dirty Data. GSTF International Journal on Computing, 1, (2), 140-148.
Peng, T.,
Li, L.,
Kennedy, J. (2011). An Evaluation of Name Matching Techniques. In: (Ed.) Proceedings of 2nd Annual International Conference on Business Intelligence and Data Warehousing , , () ( ed.). (pp. ). : . .
Li, L.,
Peng, T.,
Kennedy, J. (2010). Improving Data Quality in Data Warehousing Applications. In: (Ed.) Proceedings of the 12th International Conference on Enterprise Information Systems, , () ( ed.). (pp. ). Funchal, Madeira - Portugal: . .