dc.contributor.author |
Ali, Amjad |
|
dc.date.accessioned |
2019-07-16T07:46:13Z |
|
dc.date.accessioned |
2020-04-14T17:36:32Z |
|
dc.date.available |
2020-04-14T17:36:32Z |
|
dc.date.issued |
2019 |
|
dc.identifier.govdoc |
18166 |
|
dc.identifier.uri |
http://142.54.178.187:9060/xmlui/handle/123456789/6082 |
|
dc.description.abstract |
The performance of a classifier can affect to a great extent by the presence of missing values in a dataset. In literature, several methods have been proposed to treat missing data and the one used more frequently is deleting instances containing at least one missing value of a feature. In this part of the study we compare the three methods for dealing with missing values to evaluate the effect of misclassification error rate on the non-parametric classifier, the case deletion method, the simple random imputation and the modified random imputation procedure. The classifiers considered were the conventional random forest and the In/Out procedure of the random forest.
The missing data problem is common and often unavoidable especially when dealing with large data sets from several real-world sources. Many new computationally tools have been developed to tackle missing data problems. In some cases, the sought after missing data processes engage temporary removal or surrogate of missing data. Existing methods have been successfully applied to well-defined parametric models, however, the usefulness of these models has yet to establish for tree-based models. The problem of missing value, out-of-bag error and misclassification rates in imbalanced data are difficult to deal in Random Forest technique. In this study, a new imputation method has been proposed for In/Out procedure of Random Forest. The proposed method does not depend on the missing data mechanisms which is the principal advantages of this method. This rectifies disadvantages of all other imputation methods its performance has been evaluated and compared with non-missing data sets. It is concluded that new proposed method reduced the Out-Of-Bag error in case of missing values using different Random Forest procedure. |
en_US |
dc.description.sponsorship |
Higher Education Commission, Pakistan |
en_US |
dc.language.iso |
en_US |
en_US |
dc.publisher |
University of Peshawar, Peshawar. |
en_US |
dc.subject |
Statistics |
en_US |
dc.title |
Imputation of missing values in the In/Out procedure of Random Forest |
en_US |
dc.type |
Thesis |
en_US |