Abstract:
The amount of data to be processed is significantly increasing day by day. The increase in data size is not only due to more number of records but also due to substantial number of attributes added to space. The phenomenon is leading to the dilemma called curse of dimensionality i.e. datasets with exponential number of attributes. The ideal approach is to reduce the number of dimensions such that resulted reduced set contains the same information as present in the entire set of attributes. There are various approaches to perform this task of dimensionality reduction.
Recently, rough set-based approaches, which use attribute dependency to carry out feature selection, have been prominent. However, this dependency measure requires the calculation of the positive region, which is a computationally expensive task. In this research, we have proposed a new concept called the “Dependency Classes”, which calculates the attribute dependency without using the positive region. Dependency classes define the change in attribute dependency as we move from one record to another. By avoiding the positive region, they can be an ideal replacement for the conventional dependency measure in feature selection algorithms, especially for large datasets. A comparison framework was devised to measure the efficiency and effectiveness of the proposed measure. Experiments on various publically available datasets show that the proposed approaches provide significant computational performance with same accuracy as provided by conventional approach. We have also recommended seven feature selection algorithms using this measure. The experimental results have shown that algorithms using the classes were more effective than their counterparts using the positive region-based approach in terms of accuracy, execution time and required runtime memory.