dc.description.abstract |
The Web 2.0 has dramatically changed people‟s communication style. It is a great move toward more community oriented, highly collaborative, interactive and responsive Web. Today we are not only using the Internet but we are part of this global network. Social media sites became the world‟s largest virtual community, where people express their views about products, events and services, anytime from anywhere. These views have great impact on community thinking and decisions. The most flourished feature of this era is the rising of blogging which provides resourceful and open way to anyone, anywhere. These data sources provide the rich basis for sentiment analysis. The statistics show that 80% of consumers have changed their decisions about purchase based on negative reviews found online. The study found that blogs are 63% more likely to influence purchase decisions than magazines.
Evaluation of social media has powered interest in sentiment analysis. There exist two main approaches for extracting sentiment automatically, the lexicon-based approach and statistical or machine learning approach. The later approach demands a lot of training data to learn lexical items that express sentiment and its performance drops when the same classifiers is used in a different domain.
The main focus of this work is to develop a lexicon-based framework for automatic classification of blogs and reviews with respect to their semantic orientation. This method consists of three major components: Sentiment analysis, Slang‟s detection and scoring, and Context-aware spelling corrector. Lexicon-based methods for sentiment analysis are robust, give good performance in cross-domain and can be easily boosted with additional source of knowledge. It performs well on blog posting, reviews and also a preferable classifier for handling contextual valence shifters. Irrespective of these merits no single lexicon can perform in an optimal way all the time. This method uses a dynamic, updateable and comprehensive lexicon based on existing opinion lexicons, dictionaries and other machine-readable resources to classify the user-generated contents into positive, negative and neutral polarity.
vii
Slangs and spelling correction are two vital elements for sentiment analysis because slang and misspelled word may affect the sentiment score. These two issues were handled using Web resources and Statistical language model.
The proposed work was implemented, and evaluated with different datasets of reviews and blogs. The empirical results show that the proposed work outperforms the existing, related methods and achieves 90.3% accuracy on average. This method showed high accuracy in binary classification. All the three components of the proposed method performed well with different domains. |
en_US |