PASTIC Dspace Repository

Enhancing accuracy of Urdu sentiments Analysis,Using Lexicon-Based Approach

Show simple item record

dc.contributor.author Chiragh, Neelam.
dc.date.accessioned 2018-12-03T09:43:42Z
dc.date.accessioned 2020-04-11T14:33:57Z
dc.date.available 2020-04-11T14:33:57Z
dc.date.issued 2018
dc.identifier.govdoc 17015
dc.identifier.uri http://142.54.178.187:9060/xmlui/handle/123456789/4137
dc.description.abstract In this research the accuracy of Urdu Sentiment Analysis in multiple domains is enhanced by using the Lexicon-based approach. In the lexicon, apart from the traditional approach that considers adjectives only, nouns and verbs are also included. An efficient Urdu Sentiment Analyzer is developed that applies rules and makes use of this new lexicon to perform Urdu Sentiment Analysis by classifying sentences as positive, negative or neutral. Negations, intensifiers and context-depentent words are effectively handled for enhancing accuracy of Urdu Sentiment Analyzer. Specific rules for handling negations, intensifiers and context-dependent words are incorporated in Urdu Sentiment Analyzer. For testing the Lexicon-based approach, a corpus of 6025 sentences from 151 blogs belonging to 14 different genres is collected and the sentences are annotated by three human annotators to classify each sentence as positive, negative and neutral. Evaluating this Urdu Sentiment Analyzer, by using sentences from the corpus, yields the most promising results so far in Urdu language (up to the knowledge of the author) with 89.03% accuracy, 0.86 precision, 0.90 recall and 0.88 f-measure. The comparison with the previous works in Urdu Sentiment Analysis shows that the combination of this Urdu Sentiment Lexicon and Urdu Sentiment Analyzer is much more effective than the previous such combinations. The main reason for increased efficiency is the development of wide coverage lexicon and effective handling of negations, intensifiers and context-dependent words by the Urdu Sentiment Analyzer. Although high accuracy is achieved by Lexicon-based approach in multiple domains for Urdu Sentiment Analysis, which is the main objective of this research, but for comparison, Supervised Machine Learning approach is also used. Three well known classifiers that are Support Vector Machine, Decision Tree and K Nearest Neighbor are tested; their outputs are compared and their results are ultimately improved in several iterations. It is further concluded that K Nearest Neighbor is performing better than Support Vector Machine and Decision Tree. For verification of this result, three evaluation measures i.e. McNemar’s Test, Kappa Statistic and Root Mean Squared Error are used. The result from all these three evaluation measures confirmed that K Nearest Neighbor is performing much better than the other two classifiers and achieved 67.02% accuracy, 0.68, 0.67 and 0.67 precision, recall and f-measure respectively. The results from both the approaches are compared. On the basis of experiments performed in this research, it is concluded that the Lexicon-based approach outperforms Supervised Machine Learning approach, when Urdu Sentiment Analysis is performed in multiple domains in terms of accuracy, precision, recall and f-measure, economy of time and effort. en_US
dc.description.sponsorship Higher Education Commission, Pakistan en_US
dc.language.iso en_US en_US
dc.publisher University of Peshawar, Peshawar en_US
dc.relation.ispartofseries 17015;
dc.subject Enhancing accuracy of Urdu sentiments Analysis,Using Lexicon-Based Approach en_US
dc.title Enhancing accuracy of Urdu sentiments Analysis,Using Lexicon-Based Approach en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account