A Mathematical Model Quantifying Sequence Alignment for Constructing Phylogenetic Trees and Ant-Minor Protein Structure Classification.

Khan, Muhammad Asif

DSpace Home
→
Engineering and Technology
→
Thesis
→
View Item

A Mathematical Model Quantifying Sequence Alignment for Constructing Phylogenetic Trees and Ant-Minor Protein Structure Classification.

Khan, Muhammad Asif

URI: http://142.54.178.187:9060/xmlui/handle/123456789/5134

Date: 2019

Abstract:

Biological sequence comparison is fundamental in extracting information that is valuable in applications such as protein structure prediction, predicting structural similarity, phylogenic analysis, homology detection, function prediction and discovering evolutionary relationship. Besides biologists, numerous researchers like mathematicians, statistician and even computer scientists attracted largely towards sequence analysis because of its involvement in various important applications. Protein classi cation is one of the major areas of research in recent years. Despite technological advances, classifying proteins accurately is still a big challenge. In this work, we rst introduce an ant-inspired data mining approach for protein classi cation problem to investigate the e ectiveness of rulesbased approach. Supervised classi cation mechanism along with data mining concepts establishes compact and e cient rules classifying proteins into its correct family. Towards biological sequence analysis, we propose ASIF, a novel algorithm that consists of an alignment algorithm ASIFALIGN and a mathematical model (dASIF ) quantifying the sequence alignment. The proposed approach is based on intra-residue-distance and a plausible (unbiased) penalty factor. A standard dataset of DNA sequences are tested that produces reliable and robust sequence dissimilarities/similarities. Moreover, the proposed approach is used to construct a phylogenetic tree. Phylogenetic trees constructed by our approach outperform other methods. In addition, the proposed approach is applied to protein secondary structure classi cation problem. A dataset of twelve secondary structures are used to validate the distance matrix for classi cation purpose generated by the new alignment algorithm and a mathematical model. Results produced by the new scoring model are very much encouraging which shows reliability of our approach. Our approach not only provides a solid ground for its applications but also performs the fundamental job of dissimilarities/similarities calculation at a reasonable computational complexity. Results reveal the signi cance of our approach and provide a basis of the proposed model to be adopted for other biological applications such as protein function prediction, homology detection and protein fold recognition problem. I would like to dedicate this thesis to My Father (A strong and gentle soul who taught me to trust in ALLAH, believe in hard work and rest assure for the best of the results), My Mother (late)(For being my rst mentor and a true guide in shape of her beautiful memories and love), My Brothers, Sisters and Family (For supporting and encouraging throughout my studies and research).

Show full item record