Abstract:
Sentiment Analysis is currently one of the most studied research fields. Its aim is to analyze people‟s sentiments, opinions, attitudes etc., towards different elements such as topics, products, individuals, organizations, and services. Sentiment analysis can be achieved by machine learning or lexical based methodologies or a combination of both. Recent research shows that domain specific lexicons perform better as compared to domain independent lexicons. In an effort to improve the performance of domain independent lexicons, this research incorporates machine learning with a lexical based approach, introducing a new approach called SWIMS, to determine the feature weight based on a well-known general-purpose sentiment lexicon, SentiWordNet. Support vector machine is used to learn the feature weights and an intelligent model selection approach is employed in SWIMS in order to enhance the classification performance. The features are selected based on their subjectivity and the effects of feature selection with respect to their part of speech information are studied extensively. Seven benchmark datasets have been used in this research, including large movie review dataset, multi-domain sentiment dataset and Cornell movie review dataset, all of which are freely available online for research purposes. In-depth performance comparison is conducted with the state of the art machine learning approaches, lexical based methodologies, and other tools used for sentiment detection. The proposed SWIMS approach attained accuracy, precision, recall and f-measure values of 83.06%, 83.30%, 82.83% and 83.03% respectively, averaged over the seven datasets. The evaluation of performance measures proves that the proposed approach outperforms other techniques for sentiment analysis.