PASTIC Dspace Repository

Development of Feature Selection Algorithms for High-Dimensional Binary Data

Show simple item record

dc.contributor.author Javed, Kashif
dc.date.accessioned 2017-11-28T09:03:22Z
dc.date.accessioned 2020-04-09T16:40:38Z
dc.date.available 2020-04-09T16:40:38Z
dc.date.issued 2012
dc.identifier.uri http://142.54.178.187:9060/xmlui/handle/123456789/2911
dc.description.abstract There has been a growing interest in representing real-life applications with data sets having binary-valued features. These data sets due to the advancements in computer and data management systems consist of tens or hundreds of thousands of features. In this dissertation, we investigate two problems in machine learning which have been relatively less studied for high-dimensional binary data. The first problem is to select a subset of features useful for supervised learning applications from the entire feature set and is known as the feature selection (FS) problem. The second problem is to compare two orderings of features induced by feature ranking (FR) algorithms and to determine which one is better. For the feature selection problem, we have proposed a new feature ranking measure termed as the diff-criterion. Its distinct attribute is that it estimates the usefulness of binary features by using their probability distributions. The diff-criterion has been evaluated against two well-known FS algorithms with four widely used clas- sifiers on six binary data sets on which it has achieved up to about 99% reduction in the feature set size. To further improve the performance, we have suggested a two-stage FS algorithm. The novelty of our two-stage algorithm is that the first stage provides the second stage with a reduced subset without losing valuable in- formation about the class. Two-stage feature selection used with the diff-criterion not only significantly improves the classification accuracy but also exhibits up to about 99% reduction in the feature set size. We have also compared our proposed FS algorithms against the winning entries of the “Agnostic Learning versus Prior Knowledge” challenge. The algorithms have shown results better or comparable to the winners of the challenge. For the problem of ranking features using FR algorithms, different FR algorithms estimate the importance of features with respect to the class variable differently thus generating different orderings. To determine which ordering is better, we propose a new evaluation method termed as feature ranking evaluation strategy (FRES). It uses the individual predictive power of features for estimating howAbstract correct is an ordering of features. We found that compared to Relief and mu- tual information algorithms our proposed diff-criterion generates the most correct orderings of binary features. en_US
dc.description.sponsorship Higher Education Commission, Pakistan en_US
dc.language.iso en en_US
dc.publisher University of Engineering and Technology Lahore en_US
dc.subject Applied Sciences en_US
dc.title Development of Feature Selection Algorithms for High-Dimensional Binary Data en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account