A Framework to Improve Classification of Positive and Negative Opinions in Roman Urdu-English Code Switching Environment

Hassan, Muhammad Awais

DSpace Home
→
Engineering and Technology
→
Thesis
→
View Item

dc.contributor.author	Hassan, Muhammad Awais
dc.date.accessioned	2019-11-13T06:54:10Z
dc.date.accessioned	2020-04-11T15:40:51Z
dc.date.available	2020-04-11T15:40:51Z
dc.date.issued	2016
dc.identifier.govdoc	18589
dc.identifier.uri	http://142.54.178.187:9060/xmlui/handle/123456789/5289
dc.description.abstract	In computational linguistics, sentiment analysis facilitates classification of opinion as a positive or a negative class. In last decade, the area of sentiment analysis of English language is explored largely with different techniques those have improved the overall performance.Urdu is language of sixty-six million people and largely spoken in south-asian subcontinent. Also, it is national language of Pakistan which is world sixth most populous country according to United Nations Population Division. Sentiment analysis of Urdu language is important tool to understand the behavioural aspects, cultural values and social habits of the people living in this part of world. Opinion mining is also crucial for governments, policy makers, business owners and brand ambassadors to make their decisions in accordance to sentiment of the public. However, sentiment analysis of Urdu language is not well explored as that of English language. The Urdu sentiment analysis is performed with simple Bag-of-Word (BoW) method and machine learning (ML) techniques with limited set of features. The BoW method is not sufficient to handle complex opinions. Also, the accuracy of ML techniques, with legacy features, is not comparable to the sentiment classification task of other languages. For English language, the discourse information (sub-sentence level information) boosted the performance of both BoW method and ML techniques. A theory for Urdu sentiment analysis that extract and use the discourse information at sub sentence level and also suggest a computational model to achieve more accurate and better results than the simple bag of word approach. The proposed solution segmented the sentiment into two sub-opinions, extracted discourse information (discourse relation and polarity relation), proposed an extended BoW method (rule based method) and suggested a new small subset of features for ML techniques. The results significantly enhance (p < 0.001) the performance of recall, precision and accuracy by 37.25%, 8.46%, and 24.75% respectively. The current research targeted sentiment with two sub-opinions that remain excellent until the opinions are short messages like those on Twitter, in forum comments or as Facebook status posts. The proposed technique can be extended for sentiments with more than two sub-opinions such as blogs, reviews, and TV talk shows.	en_US
dc.description.sponsorship	Higher Education Commission Pakistan	en_US
dc.language.iso	en_US	en_US
dc.publisher	University of Engineering & Technology, Lahore.	en_US
dc.subject	Computer Science	en_US
dc.title	A Framework to Improve Classification of Positive and Negative Opinions in Roman Urdu-English Code Switching Environment	en_US
dc.type	Thesis	en_US