PASTIC Dspace Repository

Disambiguating Authors in Bibliographic Databases

Show simple item record

dc.contributor.author Shoaib, Muhammad
dc.date.accessioned 2019-07-02T11:15:57Z
dc.date.accessioned 2020-04-11T15:35:45Z
dc.date.available 2020-04-11T15:35:45Z
dc.date.issued 2016
dc.identifier.govdoc 17875
dc.identifier.uri http://142.54.178.187:9060/xmlui/handle/123456789/5064
dc.description.abstract Author name disambiguation in bibliographic databases such as DBLP1, Citeseer2, and Scopus3 is a specialized problem of entity resolution. In the literature, different approaches have been proposed and most of them base on machine learning techniques, either supervised or un-supervised learning or a combination of the two. The supervised learning approaches require labeling effort to train data. Unsupervised learning approaches utilize available attributes to group one’s citations by exploiting different similarity measures and clustering algorithms. The performance of un-supervised methods is affected by clustering algorithms, attributes and similarity measures. Previously, the focus of the research was on devising clustering algorithms and identifying attributes, but similarity measures have not been paid due attention. In this research work, we propose improved similarity measures for each type of attribute and a clustering algorithm. To estimate author name similarity, we divide name tokens into five different categories, and devise a similarity measure that accommodates them by assigning variant weights to each type of token. Our proposed similarity measure for co-authors attribute assigns higher similarity value to the citations if they share more common co-authors irrespective of the total number of co-authors. For textual attributes, we propose a conditional absolute measure (for attributes having short texts) and SDK4 index (for attributes having long texts). Experiments on DBDComp datasets show that our similarity measures outperform baseline measures by 16.2% in k-measure and 14.20 % in f-measure. We propose to use references of publications as additional sources of information. Use of titles of references improves k-measure by 0.6% and f-measure by 8% on DBLP-Ref datasets. We also propose clustering algorithm by modifying heuristic-based hierarchical clustering. Experiments on three different types of author name disambiguation collections show that our proposed methodology (similarity measures, clustering algorithm and use of references) helps improve both k-measure and f-measure. en_US
dc.description.sponsorship Higher Education Commission, Pakistan en_US
dc.language.iso en_US en_US
dc.publisher International Islamic University, Islamabad. en_US
dc.subject Computer Sciences en_US
dc.title Disambiguating Authors in Bibliographic Databases en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account