PASTIC Dspace Repository

Imputation of missing values in the In/Out procedure of Random Forest

Show simple item record

dc.contributor.author Ali, Amjad
dc.date.accessioned 2019-07-16T07:46:13Z
dc.date.accessioned 2020-04-14T17:36:32Z
dc.date.available 2020-04-14T17:36:32Z
dc.date.issued 2019
dc.identifier.govdoc 18166
dc.identifier.uri http://142.54.178.187:9060/xmlui/handle/123456789/6082
dc.description.abstract The performance of a classifier can affect to a great extent by the presence of missing values in a dataset. In literature, several methods have been proposed to treat missing data and the one used more frequently is deleting instances containing at least one missing value of a feature. In this part of the study we compare the three methods for dealing with missing values to evaluate the effect of misclassification error rate on the non-parametric classifier, the case deletion method, the simple random imputation and the modified random imputation procedure. The classifiers considered were the conventional random forest and the In/Out procedure of the random forest. The missing data problem is common and often unavoidable especially when dealing with large data sets from several real-world sources. Many new computationally tools have been developed to tackle missing data problems. In some cases, the sought after missing data processes engage temporary removal or surrogate of missing data. Existing methods have been successfully applied to well-defined parametric models, however, the usefulness of these models has yet to establish for tree-based models. The problem of missing value, out-of-bag error and misclassification rates in imbalanced data are difficult to deal in Random Forest technique. In this study, a new imputation method has been proposed for In/Out procedure of Random Forest. The proposed method does not depend on the missing data mechanisms which is the principal advantages of this method. This rectifies disadvantages of all other imputation methods its performance has been evaluated and compared with non-missing data sets. It is concluded that new proposed method reduced the Out-Of-Bag error in case of missing values using different Random Forest procedure. en_US
dc.description.sponsorship Higher Education Commission, Pakistan en_US
dc.language.iso en_US en_US
dc.publisher University of Peshawar, Peshawar. en_US
dc.subject Statistics en_US
dc.title Imputation of missing values in the In/Out procedure of Random Forest en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account