Abstract:
G protein-coupled receptors (GPCRs) are located at the boundary of a cell, and are used for
inter-cellular communications. They are mostly found in Eukaryotic cells; but can also be found
in some Prokaryote cells. GPCRs modulate synaptic transmission in spinal cord and brain, and
can trigger signaling pathways for the regulation of cell proliferation and gene expression. They
are physiologically very important and according to an estimate, more than 50% of the marketed
drugs target GPCRs. Computational prediction of unknown GPCRs has great importance in
pharmacology because, malfunction of GPCRs can cause many diseases. The goal of this thesis
is to propose new methods for the classification of GPCRs using Machine Learning approaches.
The work in this thesis is divided into two parts. The first part is based on the
classification of GPCRs using Machine Learning methods. We analyze biological, statistical, and
transform-domain based feature extraction strategies and exploited various physiochemical
properties to generate discriminate features of GPCR sequences. We have developed various
GPCR classification methods. In the first method, GPCRs are predicted using the hybridization
of pseudo amino acid composition and multi scale energy representation of physiochemical
properties. In this method, our focus is on the introduction of various physiochemical properties
(hydrophobicity, electronic and bulk property). In the second method, GPCRs are predicted
using grey incidence degree measure and principal component analysis, whereby relation
between various components of GPCR sequences is exploited. In the third method, we perform
weighted ensemble classification of GPCRs using evolutionary information and multi-scale
energy based features. The weights for each of the classifier are optimized using genetic
algorithm, which provides an improvement in classification performance.
Second part of the thesis is based on multiple sequence alignment of GPCRs, whereby,
we utilize the structural information of GPCRs. The three-dimensional structures of several
Rhodopsin like GPCRs have been resolved at atomic resolution and validates the prediction
using sequence information alone that GPCRs fold has a bundle of seven transmembrane helices
(TMs). The dataset is aligned initially using multiple sequence alignment methods and TMs are
extracted. The dataset is composed of 19 sub families of Rhodopsin receptors, belonging to 62
species. Weights are assigned to avoid bias for a particular specie. Position specific scoring
matrices (PSSM) are computed for the seven TMs data and pseudo counts are added. Pseudo
2counts are added using conventional Blosum62 scoring matrix. The unknown receptors are
classified using PSSMs of the known receptors and by the TM similarity methods.
Our research may have valuable contributions in the fields of Bioinformatics, Pattern
Classification, and Computational Biology, and has yielded comparable results with the existing
approaches. We conclude that our research may help the researchers in further exploring
membrane protein classification or any other sub cellular localization classification.