Network‐based computational approach to identify genetic links between cardiomyopathy and its risk factors

Cardiomyopathy (CMP) is a group of myocardial diseases that progressively impair cardiac function. The mechanisms underlying CMP development are poorly understood, but lifestyle factors are clearly implicated as risk factors. This study aimed to identify molecular biomarkers involved in inflammatory CMP development and progression using a systems biology approach. The authors analysed microarray gene expression datasets from CMP and tissues affected by risk factors including smoking, ageing factors, high body fat, clinical depression status, insulin resistance, high dietary red meat intake, chronic alcohol consumption, obesity, high‐calorie diet and high‐fat diet. The authors identified differentially expressed genes (DEGs) from each dataset and compared those from CMP and risk factor datasets to identify common DEGs. Gene set enrichment analyses identified metabolic and signalling pathways, including MAPK, RAS signalling and cardiomyopathy pathways. Protein–protein interaction (PPI) network analysis identified protein subnetworks and ten hub proteins (CDK2, ATM, CDT1, NCOR2, HIST1H4A, HIST1H4B, HIST1H4C, HIST1H4D, HIST1H4E and HIST1H4L). Five transcription factors (FOXC1, GATA2, FOXL1, YY1, CREB1) and five miRNAs were also identified in CMP. Thus the authors’ approach reveals candidate biomarkers that may enhance understanding of mechanisms underlying CMP and their link to risk factors. Such biomarkers may also be useful to develop new therapeutics for CMP.


Introduction
Cardiomyopathy (CMP) is a group of diseases affecting the structure and functioning of the heart, and includes conditions where the heart is affected by ventricular hypertrophy, dilation or fibrotic dysplasia that cause mechanical and electrical dysfunction. CMP may be either cardiac-specific or a part of generalised systemic disorders, but many of these conditions result in cardiovascular damage or progressive heart failure [1]. CMP is the third most prevalent cause of heart failure in the USA [1]. In 2015, about 2.6 million people worldwide were affected by cardiomyopathy and myocarditis [2]. Currently, the most commonly occurring form of CMP is dilated CMP which affects five in 100,000 adults and 0.57 in 100,000 children [3].
The etiology of the cardiomyopathy involves genetic, infectious, metabolic and environmental factors [1]. Lifestyle risk factors include severe obesity, alcohol consumption (AC), longterm high blood pressure, coronary heart disease, and sarcoidosis, but the molecular mechanisms behind the development of CMP and how these risk factors contribute to the progression of the CMP is not well understood. However, we can use our knowledge of CMP risk factors to identify key factors in CMP development by determining the altered gene expression patterns the risk induces that are also seen CMP-affected heart tissues. Using an integrative gene-network-based approach we can then identify candidate causative pathways that can be further examined [4,5].
Integrative network-based gene or multi-omics analyses are an increasingly common approach used to identify disease-associated biomarkers and therapeutic targets [6]. Such an approach is now commonly used for elucidating molecular mechanisms in different disease such as Alzheimer's disease [7][8][9][10][11][12], Parkinson's disease [13][14][15], multiple sclerosis [16], respiratory system diseases [17], colorectal cancer [18] and Thyroid cancers [19][20][21]. Therefore, in this study, a system biology-based approach was used to identify molecular biomarker transcripts (i.e. mRNAs), and proteins (hub proteins) and pathways in CMP using CMP-associated risk factors to clarify the genes that may be causative factors for the progression of CMP (Fig. 1). For this purpose, we first identified DEGs, genes whose expression is altered in CMP affected tissues and in risk-factor exposed tissues; these DEGs that were common between CMP and particular CMP-associated risk factors were then identified. These common DEGs, were then studied for their involvement in human biomolecular networks such as proteinprotein interaction (PPI) networks to identify central signalling molecules (hub proteins) and molecular pathways. This resulted in the identification of candidate genes that could mediate influences of the CMP risk factors, and these were then cross-validated using gold benchmarking datasets OMIM and dbGaP gene-disease association databases to identify those candidates with known pathological involvement.

High-throughput microarray gene expression datasets
We analysed gene expression microarray datasets to identify the molecular association of different factors with CMP at the molecular level. All the datasets used in this study were collected from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus [22], and employed Affymetrix Human DNA arrays unless otherwise stated. The utilised gene expression datasets with accession numbers GSE4172, GSE1144, GSE4806, GSE12654, GSE20950, GSE25220, GSE44456, GSE48964, GSE56960 and GSE68231 were analysed in this study. The CMP dataset (GSE4172) was obtained by gene expression profiling of human inflammatory CMP [23]. The ageing (AG) dataset (GSE1144) was obtained by analysing gene expression in skeletal muscle tissue characterised by loss of metabolic and contractile IET Syst. Biol., 2020, Vol. 14 Iss. 2, pp. [75][76][77][78][79][80][81][82][83][84] This is an open access article published by the IET under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/) competence [24]. The smoking (SM) dataset (GSE4806) was obtained from gene expression profiles of T-lymphocytes from smokers and non-smokers [25]. The depression (DEP) dataset (GSE12654) was attained by gene expression from the human prefrontal cortex (BA10) [26]. The dataset (GSE20950) for insulin resistance (IR) was obtained by gene expression data from human adipose tissue using an IR patient cohort [27]. The red meat (RM) dietary intervention dataset (GSE25220) was generated using an Agilent-014850 whole human genome microarray data from human colon biopsies before and after participating in a high RM dietary intervention [28]. The AC dataset (GSE44456) was obtained by examining gene expression in post-mortem hippocampus tissues from 20 alcoholics and 19 controls [29]. The obesity (OB) dataset (GSE48964) was obtained by expression data from adipose stem cells (ASCs) from morbidly obese and nonobese individuals [30]. The high-calorie diet (HCD) dataset (GSE56960) was obtained by expression profiling of array of blood cell transcriptome of two different population groups after the ingestion of different caloric doses [31]. The high-fat diet (HFD) dataset (GSE68231) is Affymetrix Human Genome data obtained from human skeleton muscle of five subjects in each group selected before and after three days of an HFD [32].

Identification of differentially expressed genes
We performed a differential gene expression analysis of the CMP with nine risk factors from transcriptomics datasets. Firstly, we transformed each gene expression data for each disease using the Z-score (or zero mean) normalisation method for both disease and control states. This might resolve the problems regarding mRNA data comparisons using different platforms and experimental setups [33]. Each sample of the gene expression matrix was normalised using mean and standard deviation. The expression value of the gene i in sample j represented by g i j was transformed into Z i j by computing where SD is the standard deviation. Comparing values of gene expression for various samples and diseases are made possible by this transformation. The gene expression datasets were normalised by log 2 transformation and unpaired student t-test was used. Finally, genes were filtered by setting threshold values with adjusted p-value <0.05 and absolute log fold change (log FC) >1.0 to designate statistically significant DEGs.

Gene set enrichment analysis to identify gene ontology and pathways
To clarify the biological significance of the identified DEGs, geneset enrichment analysis and pathways analysis were performed to identify the significant gene ontology terms and KEGG pathways enriched by DEGs via EnrichR [34,35]. For statistical significance, the adjusted p-value < 0.05 was considered for the significance assessment of enrichment results.

Identification of transcriptional and/or post-transcription regulators of the DEGs
To identify regulatory transcription factors (TFs) that regulate the DEGs at the transcriptional level, TF-target gene interactions were obtained from the JASPAR database to identify TFs based on topological parameters [36]. The regulatory miRNAs which regulate DEGs at the post-transcriptional level were identified from miRNAs-target gene interactions were obtained from TarBase and miRTarBase based on topological parameters [37][38][39].

PPI analysis to identify hub proteins
We reconstructed a PPI network around the proteins encoded by the DEGs using protein interactome database STRING [40]. The PPI network was analysed by Cytoscape (v3.5.1) [41,42]. An undirected graph representation was used for the PPI network, where the nodes indicate proteins and the edges symbolised the interactions between the proteins. We performed a topological analysis using Cyto-Hubba plugin [43,44] in Cytoscape to identify highly connected proteins (i.e. hub proteins) in the network and the degree metrics were employed [45,46].

Protein-drug interactions analysis
The protein-drug interactions were analysed using the DrugBank database (Version 5.0) to identify potential small molecules that can affect pathways we identified as important in CMP and which may point to therapeutic approaches for CMP [47].

Identification of differentially expressed genes from microarray gene expression datasets
The gene expression datasets of CMP were analysed and a total of 1764 DEGs were identified in CMP patients compared to control samples where 919 genes were up-regulated and 845 genes were down-regulated.

Molecular pathway and functional analysis
To clarify the biological roles of the identified common DEGs between CMP and other risk factors, we performed gene ontology analysis to identify the biological process, cellular component and molecular functions enriched by the DEGs (Table 2) The significantly altered molecular pathways were identified in CMP and other risk factors. A total of 61 pathways were found to be over-represented among several groups out of which some significant pathways are shown in Table 2. The amino acid metabolism pathway such as alanine metabolism pathways, different signalling pathways such as MAPK and RAS signalling pathways, ECM pathways, and alcoholism came into prominence as signalling pathways.
The FOXC1 is a TF that plays a critical role in early cardiomyogenesis [48]. It is also required for the morphogenesis process of cardiac outflow tract [58]. The TF GATA2 expression is high in the thoracic aorta and GATA2 variants are associated with early-onset familial coronary artery disease [49]. FOXL1 is a TF whose elevated expression is associated with good outcomes in human pancreatic ductal adenocarcinoma [50] but does not have a known association with cardiac diseases. The activity of YY1 TF is increased in human heart failure [51]. CREB over-expression is associated with cardiac failure suggesting it plays a significant role in cardiac pathologies [52]. microRNAs (miRNAs) are short single-stranded RNA molecules (∼22 nucleotides long) that regulate the expression of genes at post-transcriptional stage. miRNAs are being considered as potential sources of biomarkers for complex disease including neurodegenerative disease and cancers. Therefore, we have identified those miRNAs controlling the DEGs to provide insights into the regulatory biomolecules. Among the miRNAs, mir-335-5p was identified as upregulated in experimental heart failure by experimental animals [53]. Sun et al. [59] also predicted mir-335-5p is implicated in hypertrophic CMP pathway by microarray analysis. Jia et al. [54] showed mir-26b-5p was associated with suppression of proliferation and enhance the apoptosis in multiple myeloma cells. It has been proposed the mir-34a-5p could prevent autophagic cell deaths in ischemic hearts and in this way can improve the myocardial injury [55,60]. The inhibition of mir-92a-3p leads to increase blood vessel growth and recovery of damaged tissues in myocardial infarction mice models, which suggest it may be an important therapeutic target in ischaemic heart disease [56]. The mir-17-5p has been suggested as important prognostic biomarkers in cancer, including hepatocellular carcinoma [57].

PPI network analysis
The PPI network was constructed using all the distinct 236 differentially expressed genes that were common between the CMP and the risk factors (Figs. 6 and 7). The topological analysis using degree matrices was used to identify highly connected proteins clusters. Each node in the network represents a protein and an edge indicates the interaction between two proteins. We detected ten hub proteins (CDK2, ATM, CDF1, NCOR2, HIST1H4A, HIST1H4B, HIST1H4C, HIST1H4D, HIST1H4E and HIST1H4L) in PPI analysis. These hub proteins may be potential drug targets.

Discussion
In this study, the molecular mechanisms that may link CMP and associated risk factors were investigated. We performed an analysis of gene expression data from CMP tissue analysis and from the risk factors in order to identify the common DEGs shared by CMP and the risk factors. This identified CMP affected tissues shared with 81 genes with tissues and cells affected by HFD exposure; similarly there were shared DEGs seen for AG (48 genes) and SM (32 genes), the conditions that shared most DEGs with CMP. To clarify the biological relevance of the identified DEGs, GO and molecular pathways analysis was performed which revealed pathways with significantly altered activity. Among such pathways, MAPK signalling cascades have been reported to be prominent in the pathogenesis of cardiac and vascular disease [61][62][63]. Another pathway, RAS signalling, plays a critical role in cardiac hypertrophy, which suggests complexity in developing meaningful therapy for individuals with these RASopathies [64]. Clinical and genetic studies have also revealed close relationships between cell adhesion proteins and the occurrence of various CMPs [65], thus indicating the important role of focal adhesion pathways in CMP. Related to this, extracellular matrix alterations may be a significant factor in the pathogenesis of dilative CMP [66]. Moreover, molecular pathways hypertrophic CMP, arrhythmogenic right ventricular CMP, dilated CMP pathways were notably and consistently seen to be enriched in CMP.
Analysis of PPIs can provide some detailed insights into the central mechanism behind the diseases [9, 11, 12]. Therefore, we reconstructed the PPI networks around the protein encoded by the DEGs. Based on our topological analysis, we detected ten hub proteins (CDK2, ATM, CDT1, NCOR2, HIST1H4A, HIST1H4B, HIST1H4C, HIST1H4D, HIST1H4E and HIST1H4L) involved in the CMP. A brief description of hub proteins, their gene ontology and features are presented in Table 4. Among the hubs, CDK2 involved in the regulation of myocardial ischaemia and reperfusion injury [67,74]. The hub protein apical transverse motion (ATM) is associated with electromechanical dyssynchrony in adult dilated CMP [68]. The ATM protein involved in CMP associated with obesity and IR [75]. The hub protein CDT1 is associated with genotoxic stress, which results in aberrant cell proliferation leading to cancer formation [69], but its association with the CMP is not known. Another hub protein, NCOR2 has been reported to be associated with non-alcoholic fatty liver disease [70], which is one of the prominent risk factors for cardiovascular disease. Yin [72] have reported the dysregulation of HIST1H4B in rat cardiomyocytes. The other hub proteins, HIST1H4A, HIST1H4D, HIST1H4E and HIST1H4L were not reported to CMP yet. These identified hubs proteins might be considered as candidate biomarkers or, if their biological role is confirmed, as potential drug targets.
Based on the network-based approach, our analyses revealed novel relationships between CMP and other susceptibility/ causative factors. This study identified potential biomarkers, which may be candidates for the development of prognostic strategies and treatments. Since the common pathways may indicate ways by the risk factors influence CMP, such pathways and their hub genes identified in this study may have important pathogenic roles in CMP. To examine this and so to validate the results of this systems biology approach, we also analysed the DEGs associated with CMP and each of the risk factors with OMIM databases and dbGAP databases using the valid gold benchmark the disease-gene associations ( Table 5). The DEGs of nine risk factors were identified as showing suggestive links that may promote CMP development and progression. This analysis furnishes new hypotheses that may point the way to establishing mechanistic links between the CMP and the various risk factors that we examined.

Conclusion
In this study, the genetic association of CMP with various diseasome was identified from comprehensive transcriptomics analyses incorporated with human biomolecular networks to reveal candidate biomarkers at RNA level (transcripts and miRNAs) and protein levels (hub proteins); identified as potential key signalling and regulatory biomolecules in CMP; we also identified possible molecular pathways with CMP involvement. Protein-drug interaction studies revealed eight gene products that had detectable in silico interaction with four compounds including, Amoxapine, L-Glutamic Acid, Amitriptyline and Acamprosate, which are all compounds already available for therapeutic use apart from glutamate, with is a nutrient and neurotransmitter. Thus, new genebased recommendations for disease diagnosis and possible