REDEFINING URDU MORPHOLOGY AND GRAMMAR FOR THE DEVELOPMENT OF AN INTEGRATED SENTIMENT ANALYSIS FRAMEWORK

SYED, AFRAZ ZAHRA

DSpace Home
→
Engineering and Technology
→
Thesis
→
View Item

dc.contributor.author	SYED, AFRAZ ZAHRA
dc.date.accessioned	2017-12-12T06:57:30Z
dc.date.accessioned	2020-04-11T15:42:13Z
dc.date.available	2020-04-11T15:42:13Z
dc.date.issued	2013
dc.identifier.uri	http://142.54.178.187:9060/xmlui/handle/123456789/5335
dc.description.abstract	The rise of social networking sites and blogs has simulated a bull market in personal opinion; consumer recommendations, product reviews, ratings, and other types of online expressions. For computational linguistic researchers, this fast-growing heap of information has opened an exciting research frontier, referred as, the Sentiment Analysis (SA). For English, this area is under consideration from last decade. But, other major languages, like Urdu, are totally overlooked by the research community. Urdu is a morphologically rich and recourse poor language. The distinctive features, like, complex morphology, flexible grammar rules, context sensitive orthography and free word order, make the Urdu language processing a challenging problem domain. For the same reasons, sentiment analysis approaches and techniques developed for other well-explored languages are not workable for Urdu text. This dissertation presents a grammatically motivated, sentiment classification framework to handle these distinctive features of the Urdu language. The main research contributions are; to highlight the linguistic (orthography, grammar and morphology, etc.) as well as technical (parsing algorithm, lexicon, corpus, etc.) aspects of this multidimensional research problem, to explore Urdu morphological operations, grammar and orthographic rules, to redefine these operations and rules with respect to the requirements of sentiment analysis framework. The orthographical, morphological, grammatical and finally the conceptual details of the language are our target concerns. Additionally, our approach can help in the sentiment analysis of other languages, like Arabic, Persian, Hindi, Punjabi etc. The proposed framework emphasizes on the identification of the SentiUnits, rather than, the subjective words in the given text. SentiUnits are the sentiment carrier expressions, which reveal the inherent sentiments of the sentence for a specific target. The targets are the noun phrases for which an opinion is made. The system extracts SentiUnits and the target expressions through the shallow parsing based chunking. The dependency parsing algorithm creates associations between these extracted expressions. The framework uses the sentiment-annotated lexicon based approach. Each entry of the lexicon is marked with its orientation (positive or negative) and the intensity (force of orientation) score. The experimentation based evaluation of the system with a sentiment-annotated lexicon of Urdu words and two corpuses of reviews as test-beds, shows encouraging achievement in terms of accuracy, precision, recall and f-measure.	en_US
dc.description.sponsorship	Higher Education Commission, Pakistan	en_US
dc.language.iso	en	en_US
dc.publisher	UNIVERSITY OF ENGINEERING AND TECHNOLOGY LAHORE – PAKISTAN	en_US
dc.subject	Computer science, information & general works	en_US
dc.title	REDEFINING URDU MORPHOLOGY AND GRAMMAR FOR THE DEVELOPMENT OF AN INTEGRATED SENTIMENT ANALYSIS FRAMEWORK	en_US
dc.type	Thesis	en_US