dc.description.abstract |
In computational linguistics, sentiment analysis facilitates classification of opinion as a positive or a negative class. In last decade, the area of sentiment analysis of English language is explored largely with different techniques those have improved the overall performance.Urdu is language of sixty-six million people and largely spoken in south-asian subcontinent. Also, it is national language of Pakistan which is world sixth most populous country according to United Nations Population Division. Sentiment analysis of Urdu language is important tool to understand the behavioural aspects, cultural values and social habits of the people living in this part of world. Opinion mining is also crucial for governments, policy makers, business owners and brand ambassadors to make their decisions in accordance to sentiment of the public. However, sentiment analysis of Urdu language is not well explored as that of English language. The Urdu sentiment analysis is performed with simple Bag-of-Word (BoW) method and machine learning (ML) techniques with limited set of features. The BoW method is not sufficient to handle complex opinions. Also, the accuracy of ML techniques, with legacy features, is not comparable to the sentiment classification task of other languages. For English language, the discourse information (sub-sentence level information) boosted the performance of both BoW method and ML techniques. A theory for Urdu sentiment analysis that extract and use the discourse information at sub sentence level and also suggest a computational model to achieve more accurate and better results than the simple bag of word approach. The proposed solution segmented the sentiment into two sub-opinions, extracted discourse information (discourse relation and polarity relation), proposed an extended BoW method (rule based method) and suggested a new small subset of features for ML techniques. The results significantly enhance (p < 0.001) the performance of recall, precision and accuracy by 37.25%, 8.46%, and 24.75% respectively. The current research targeted sentiment with two sub-opinions that remain excellent until the opinions are short messages like those on Twitter, in forum comments or as Facebook status posts. The proposed technique can be extended for sentiments with more than two sub-opinions such as blogs, reviews, and TV talk shows. |
en_US |