PASTIC Dspace Repository

Hadoop-Based Algorithm for Inference of Phylogenetic Trees Using Maximum Likelihood Method

Show simple item record

dc.contributor.author Sardaraz, Muhammad
dc.date.accessioned 2018-02-16T06:47:26Z
dc.date.accessioned 2020-04-11T15:33:32Z
dc.date.available 2020-04-11T15:33:32Z
dc.date.issued 2016
dc.identifier.uri http://142.54.178.187:9060/xmlui/handle/123456789/4883
dc.description.abstract Biological sequences consist of A C G and T in a DNA structure and contain vital information of living organisms. This information is used in many applications such as drug design, microarray analysis and phylogenetic trees. Advances in computing technologies, specifically Next Generation Sequencing technologies have increased genomic data at a rapid rate. The increase in genomic data presents significant research challenges in bioinformatics, such as sequence alignment, short read error correction, phylogenetic inference etc. Various tools and algorithms have been proposed for phylogenetic inference. Early algorithms used sequential programs to solve the problem of phylogenetic inference. Improvements were gained in terms of tree accuracy and execution time, however; the programs were still slow, and improvements were needed to infer correct phylogeny in short times. This challenge introduced parallel and distributed processing to the field of bioinformatics. Many tools and programs have been developed based on parallel and distributed computing. This thesis presents algorithmic solutions for phylogenetic inference. Solutions include ‘PhyloDoop’ and ‘SeqCompress’ algorithms. PhyloDoop algorithm is used for inference of phylogenetic trees. The algorithm is based on Maximum Likelihood method, implemented on Hadoop Map/Reduce framework. PhyloDoop is based on clusters i.e. divides the input alignment to clusters, builds trees for each cluster, merges and optimizes all sub-trees and the final tree is also optimized. PhyloDoop is compared to well-known algorithms both on real and simulated datasets. Experiments on real datasets were performed to test likelihood values, execution time, and speedup in distributed environment. The results show better accuracy as compared to other algorithms on most of the datasets. Execution time is also short on most datasets. The proposed algorithm yields better speed up on large datasets. Simulated datasets were used to measure topological accuracy. PhyloDoop is topologically accurate on most datasets with short execution time in comparison to other algorithms. SeqCompress is used to compress DNA sequences in order to reduce memory requirements and execution time. Impressive results are shown in comparison to other algorithms. These results show a gap for efficient usage of compression techniques to infer correct phylogeny with low memory requirements as well as execution time. en_US
dc.description.sponsorship Higher Education Commission, Pakistan en_US
dc.language.iso en en_US
dc.publisher Iqra University Islamabad Campus en_US
dc.subject Computer science, information & general works en_US
dc.title Hadoop-Based Algorithm for Inference of Phylogenetic Trees Using Maximum Likelihood Method en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account