Abstract:
Telecommunication industry has grown rapidly during the last decade. The number of cellular subscribers is approaching about 96% of total population of the world. In such a fierce competition, telecom service providers are facing saturated markets with little room for penetration. Therefore, telecom companies are focusing more on customer retention, which is considered cost effective as compared to adding new customers. Moreover, customer retention is more economical as it does not involve any additional marketing expense. Long term customers are also considered as easier to serve, contribute more toward stable profitability, and introduce new referrals as well. On the other hand, new customers are hard to be attracted in competitive markets and take little longer for establishing loyalties with the new service providers. Therefore, telecom industry requires a reliable churn prediction system, which accurately identifies the customers who are about to switch over to another service provider.
The role of customer churn prediction system has become pivotal in retaining customers expected to churn by luring them with the improved service packages. This preceding knowledge of customers‟ churning would enable service providers to avoid sizeable revenue losses. Consequently for churn prediction, researchers have investigated many interesting data mining techniques that can meet the specific demands of telecom industry. However, the telecom churn prediction is still a challenging tak because of the the big size, imbalanced class distribution, and high dimensionality of telecom datasets.
The main focus of this thesis is to identify discriminative feature extraction techniques and effective sampling methods to cater for the enormous nature of telecom datasets. Additionally, investigations are made to develop a churn prediction system with better classification and interpreting capabilities. This thesis makes the following contributions in the area of telecom churn prediction: 1) Analysis of minimum redundancy and maximum relevance (mRMR) method for extracting relevant and meaningful features, 2) Exploiting Genetic Algorithm based wrapper method to remove any redundant features from selected features, 3) Analysis of PSO
xvi
based intelligent sampling technique and its comparison to conventional undersampling techniques, 4) Constructing efficient churn prediction systems using computational intelligence based ensemble classification approaches (CP-MRF, Chr-mRF FEW-ChrP), 5) Employing novel GP-AdaBoost based ensemble classifier to develop an efficient churn prediction system with the additional capability of identifying factors responsible for churning, 6) Attaining highest churn prediction performance of 0.862 AUC and 0.910 AUC on Orange and Cell2Cell telecom datasets, respectively. 7) Extracting 47 useful features from 260 original features of Orange dataset and 35 features from 76 original features of Cell2Cell dataset.
In short, under this research work extensive simulations are performed to examine the prediction performance of the proposed churn prediction systems distinguishing churners from non-churners.