Abstract:
Genes provide instructions for the synthesis of functional products, such as, proteins. Gene expression develops the functional products using the instructions encoded in the genes. Gene regulation controls the process of gene expression in a way that it can regulate the increase or decrease of the gene expression resulting in the synthesis of specific functional products. It also controls when or when not to express a particular gene to produce a particular protein. Collection of regulatory elements, such as, genes and their interconnections showing the gene expression levels, are visualized as a Gene Regulatory Network (GRN). GRNs act as a tool for understanding the causation relationships between the genes and proteins representing complex cellular functionalities. Computational biology has laid its main focus nowadays on the reverse engineering or reconstruction of GRNs from gene expression data to decode the complex mechanism of the cellular functionalities. These efforts have resulted in improved and more precise diagnostics and therapeutics. Microarray technology of analyzing gene expressions calculates expression of thousands of genes simultaneously under different conditions, like, control or disease conditions. It helps in identifying over-expressed genes likely to be associated with the disease.
Multiple approaches to reconstruct GRNs from gene expression data, apply various techniques, such as, distance measures, correlations, mutual information algorithms, dynamic and quantitative probabilities. These approaches result in identifying symmetric and diagonal gene pair interactions. Symmetric gene pair interactions cannot be modeled as direct activation and inhibition interactions. Moreover, diagonal nature shows that a gene cannot self-regulates itself, which is also contradictory with the true nature of gene pair regulatory interactions. Compromising the true asymmetric and non-diagonal nature of the actual gene pair regulatory interactions, can lead to incomplete and inferior predictions. To our knowledge, no such complete model exists to generate GRN representing all possible network motifs between gene pairs, such as, activation, inhibition and self-regulations. The proposed approach, named as, Multivariate Covariance Network (MCNet), aims at reconstructing GRN applies multivariate co-variance analysis and Principal Component Analysis (PCA) to identify asymmetric and non-diagonal gene interactions. The GRN developed using the MCNet approach holds all the possible network motifs, representing all kinds of gene-pair regulatory interactions (i.e., positive and negative feedback loops as well as self-loops). The asymmetry is achieved by computing the distance measure of the genes with respect to the eigen values of the related genes showing variable behaviors under different conditions. PCA in the MCNet approach selects gene-pair interactions showing maximum variances in gene regulatory expressions. Asymmetric gene regulatory interactions help in identifying the controlling regulatory agents, thus, lowering the false positive rate of interacting genes by minimizing the connections between previously unlinked network components.
The performance of the proposed approach, MCNet, has been evaluated using a real data set as well as three synthetic and gold standard data sets. The MCNet approach predicts the regulatory
vi
interactions with higher precision and accuracy as compared to some currently state-of-the-art approaches. The results of the MCNet approach using the real time-series RTX therapy data set identified self-regulatory interactions of the differentially expressed (DE) genes with 80.6% accuracy. The MCNet approach predicted the gene regulatory interactions of the time-series synthetic Arabidopsis Thaliana circadian clock data set with 90.3% accuracy. The self-regulatory interactions identified in the RTX therapy and synthetic Arabidopsis Thaliana data sets are further verified from the literature because gold standards are not available for these data sets. Gold standard DREAM-3 and DREAM-8 in silico data sets, are also used to evaluate the performance of the proposed approach, while comparing with some existing approaches. The DREAM-3 in silico E-coli gold standard data set does not contain any self-regulations, while the DREAM-8 in silico phosphoproteins gold standard data set hold self-regulations. The results demonstrate the enhanced performance of the MCNet approach for predicting self-regulations only in the DREAM-8 in silico phosphoproteins data set with 75.8% accuracy. The MCNet approach for reconstructing GRN identifies direct activation and inhibition interactions as well as self-regulatory interactions from microarray gene expression data sets. The generated GRN can constitute positive and negative feedback loops as well as self-loops to demonstrate true nature of the gene-pair regulatory interactions. In future, it is aimed to enhance the functionality of the MCNet approach by modeling the dynamics of the GRNs, such as, oscillations and bifurcations towards steady state