Enhancing accuracy of Urdu sentiments Analysis,Using Lexicon-Based Approach

Chiragh, Neelam.

DSpace Home
→
Humanities
→
Thesis
→
View Item

dc.contributor.author	Chiragh, Neelam.
dc.date.accessioned	2018-12-03T09:43:42Z
dc.date.accessioned	2020-04-11T14:33:57Z
dc.date.available	2020-04-11T14:33:57Z
dc.date.issued	2018
dc.identifier.govdoc	17015
dc.identifier.uri	http://142.54.178.187:9060/xmlui/handle/123456789/4137
dc.description.abstract	In this research the accuracy of Urdu Sentiment Analysis in multiple domains is enhanced by using the Lexicon-based approach. In the lexicon, apart from the traditional approach that considers adjectives only, nouns and verbs are also included. An efficient Urdu Sentiment Analyzer is developed that applies rules and makes use of this new lexicon to perform Urdu Sentiment Analysis by classifying sentences as positive, negative or neutral. Negations, intensifiers and context-depentent words are effectively handled for enhancing accuracy of Urdu Sentiment Analyzer. Specific rules for handling negations, intensifiers and context-dependent words are incorporated in Urdu Sentiment Analyzer. For testing the Lexicon-based approach, a corpus of 6025 sentences from 151 blogs belonging to 14 different genres is collected and the sentences are annotated by three human annotators to classify each sentence as positive, negative and neutral. Evaluating this Urdu Sentiment Analyzer, by using sentences from the corpus, yields the most promising results so far in Urdu language (up to the knowledge of the author) with 89.03% accuracy, 0.86 precision, 0.90 recall and 0.88 f-measure. The comparison with the previous works in Urdu Sentiment Analysis shows that the combination of this Urdu Sentiment Lexicon and Urdu Sentiment Analyzer is much more effective than the previous such combinations. The main reason for increased efficiency is the development of wide coverage lexicon and effective handling of negations, intensifiers and context-dependent words by the Urdu Sentiment Analyzer. Although high accuracy is achieved by Lexicon-based approach in multiple domains for Urdu Sentiment Analysis, which is the main objective of this research, but for comparison, Supervised Machine Learning approach is also used. Three well known classifiers that are Support Vector Machine, Decision Tree and K Nearest Neighbor are tested; their outputs are compared and their results are ultimately improved in several iterations. It is further concluded that K Nearest Neighbor is performing better than Support Vector Machine and Decision Tree. For verification of this result, three evaluation measures i.e. McNemar’s Test, Kappa Statistic and Root Mean Squared Error are used. The result from all these three evaluation measures confirmed that K Nearest Neighbor is performing much better than the other two classifiers and achieved 67.02% accuracy, 0.68, 0.67 and 0.67 precision, recall and f-measure respectively. The results from both the approaches are compared. On the basis of experiments performed in this research, it is concluded that the Lexicon-based approach outperforms Supervised Machine Learning approach, when Urdu Sentiment Analysis is performed in multiple domains in terms of accuracy, precision, recall and f-measure, economy of time and effort.	en_US
dc.description.sponsorship	Higher Education Commission, Pakistan	en_US
dc.language.iso	en_US	en_US
dc.publisher	University of Peshawar, Peshawar	en_US
dc.relation.ispartofseries	17015;
dc.subject	Enhancing accuracy of Urdu sentiments Analysis,Using Lexicon-Based Approach	en_US
dc.title	Enhancing accuracy of Urdu sentiments Analysis,Using Lexicon-Based Approach	en_US
dc.type	Thesis	en_US