Abstract:
Machine learning algorithms have been widely used for classification purposes in a
number of research domains; however, very few researches paid any attention to statistically validate
the performance of these algorithms for different data. This paper attempted to study the Naïve Bayes
algorithm’s performance for dataset of different sizes. Furthermore, a known theory has also been
investigated, that building multiple models such as Bagging, Boosting and Stacking tend to improve a
classifier’s performance. The analysis has been performed using McNemar’s test; a well known nonparametric statistical test in the medical analysis domain. Results showed that not all ensemble
methods work as expected and therefore, needs to be selected carefully. Moreover, the use of
McNemar’s test appeared to be simple, but gave statistically valid results.