Robust Algorithm for Genome Sequence Short Read Error Correction using Hadoop-MapReduce

Tahir, Muhammad

DSpace Home
→
Engineering and Technology
→
Thesis
→
View Item

Robust Algorithm for Genome Sequence Short Read Error Correction using Hadoop-MapReduce

Tahir, Muhammad

URI: http://142.54.178.187:9060/xmlui/handle/123456789/4884

Date: 2016

Abstract:

Biological sequences consist of A C G and T in a DNA structure and contain vital information of living organisms. The development of computing technologies, especially NGS technologies have increased genomic data at a rapid rate. The increase in genomic data presents significant research challenges in bioinformatics, such as sequence alignment, short-reads error correction, phylogenetic inference, etc. Next-generation high-throughput sequencing technologies have opened new and thought-provoking research opportunities. In particular, Next-generation sequencers produce a massive amount of short-reads data in a single run. However, these large amounts of short-reads data produced are highly susceptible to errors, as compared to shotgun sequencing. Therefore, there is a peremptory demand to design fast and more accurate statistical and computational tools to analyze these data. This research presents a novel and robust algorithm called HaShRECA for genome sequence short reads error correction. The developed algorithm is based on a probabilistic model that analyzes the potential errors in reads and utilizes the Hadoop-MapReduce framework to speed up the computation processes. Experimental results show that HaShRECA is more accurate, as well as time and space efficient as compared to previous algorithms.

Show full item record