Investigating Machine Learning Based Prediction of Protein Interactions

Abbasi, Wajid Arshad

DSpace Home
→
Engineering and Technology
→
Thesis
→
View Item

dc.contributor.author	Abbasi, Wajid Arshad
dc.date.accessioned	2019-06-28T11:45:44Z
dc.date.accessioned	2020-04-11T15:35:15Z
dc.date.available	2020-04-11T15:35:15Z
dc.date.issued	2019
dc.identifier.govdoc	18160
dc.identifier.uri	http://142.54.178.187:9060/xmlui/handle/123456789/5032
dc.description.abstract	Protein interactions are crucial in the cell for performing cellular functions and the study of protein interactions is a very important domain of research in bioinformatics. In reference to protein interactions, biologists are usually interested in three core problems: determining pairwise protein interactions, determination of binding affinity, and identification of the interface. Computational methods to solve these protein interaction problems have emerged as an active research area due to tedious, costly, and time-consuming experimental procedures. Our aim in this work is to develop novel machine learning based methods for protein interaction, binding affinity and interaction prediction with improved generalization performance. In this dissertation, we have developed host-pathogen protein interaction predictors using machine learning. One of our findings is that existing methods for protein interaction prediction that use K-fold cross-validation for performance assessment report over-estimated accuracy values as K-fold cross-validation does not take pairwise protein similarity between training and test examples into account. To control this data redundancy at pathogen protein level, we have proposed and advocated the use of an alternate evaluation scheme called Leave One Pathogen Protein Out (LOPO) cross-validation along with some biologist centric metrics for designing protein-protein interaction prediction methods. We have also designed a novel machine learning model called CaMELS (CalModulin intEraction Learning System) for interaction and interaction site prediction of Calmodulin (CaM) which is a very important and highly conserved protein across all eukaryotes. CaMELS relies on a novel implementation of multiple instance learning solver for protein binding site prediction that leads to significant improvement in predictive performance. One of our collaborators has confirmed the effectiveness of CaMELS through wet-lab experiments as well. We have also focused on the more generic problem of predicting binding affinity in protein interactions and presented various sequence-based machine learning models. xxiv For this purpose, we have developed a novel machine learning method which is based on the framework of Learning Using Privileged Information (LUPI). Our state-of-the-art method uses protein 3D structure as privileged information at training time while expecting only protein sequence information during testing. This makes our machine learning method flexible by allowing it to leverage protein structure information during training while requiring only protein sequence information during testing. We have also developed a webserver for an existing state-of-the-art protein-protein interface prediction method called PAIRPred. The accuracy of this webserver has also been validated by our collaborators through wet-lab experiments as well.	en_US
dc.description.sponsorship	Higher Education Commission, Pakistan	en_US
dc.language.iso	en_US	en_US
dc.publisher	Pakistan Institute of Engineering & Applied Sciences, Islamabad.	en_US
dc.subject	Computer Science	en_US
dc.title	Investigating Machine Learning Based Prediction of Protein Interactions	en_US
dc.type	Thesis	en_US