Comparison between Maximum Entropy and Naïve Bayes classifiers: Case study; Appliance of Machine Learning Algorithms to an Odesk’s Corporation Dataset

Maroulis, G. (2014). Comparison between Maximum Entropy and Naïve Bayes classifiers: Case study; Appliance of Machine Learning Algorithms to an Odesk’s Corporation Dataset (MSc Information Systems Development Dissertation). Edinburgh Napier University (Hart, E., Urquhart, N.).


ISBN:
ISSN:

Abstract

Natural Language Processing, Artificial Intelligence and Machine Learning are rapidly growing technologies that their appliance unlocks great opportunities and possibilities for the implementation of automated decision making systems.
The case study of this dissertation was to compare the Naïve Bayes and Maximum Entropy Machine-Learning algorithms for text classification by building an application with a portion of oDesk’s incorporation (oDesk) database as a dataset. The literature review of this dissertation introduces and explores how classification problems can be solved with machine-learning algorithms. Additionally, it is explored and presented natural language processing techniques that could be applied for data mining and text processing. Finally, this literature review ends with a technology review regarding which programming languages and libraries could be used in order to implement a text classification system. The programming language that has been used was Python (Python) with nltk (NLTK 2.0) library for the development of the experiments.
The implementation process and the results of the experiments have shown the significance of data processing and how this procedure can affect the classification results. Finally, the low performance of the implemented application does not affect the conclusion that machine-learning systems could be applied to large incorporations and organizations to improve the business processes and the customer experience.
[Read More]

Authors

Georgios Maroulis
student developer
+44 131 455

Areas of Expertise

Associated Projects

    Keywords: data mining