Identification and experimental validation of biomarkers related to mitochondrial and programmed cell death in obsessive-compulsive disorder - Scientific Reports


Identification and experimental validation of biomarkers related to mitochondrial and programmed cell death in obsessive-compulsive disorder - Scientific Reports

This study employed transcriptomic data analysis, weighted gene co-expression network analysis (WGCNA), machine learning, and other methodologies to identify two key mitochondrial and PCD (MTPCD)-related genes, NDUFA1 and COX7C, as potential biomarkers for OCD. Functional analysis, molecular regulatory network exploration, and disease and drug analysis are performed to investigate the pathogenesis of OCD, offering new insights for clinical diagnosis and therapeutic development.

To obtain datasets meeting the research criteria, we searched the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) using "obsessive-compulsive disorder" as the keyword. Datasets containing samples from patients with confirmed OCD and corresponding control samples were selected to ensure phenotypic consistency required for case-control analysis. Meanwhile, the chosen datasets were required to have sufficient sample sizes and consistent sequencing platforms. Finally, datasets GSE78104 and GSE60190 were included. The GSE78104 dataset (GPL19612) served as the training set, consisting of peripheral blood samples from 30 patients with OCD and 30 healthy controls. The GSE60190 dataset (GPL6947) was used as the validation set, including tissue samples from the dorsolateral prefrontal cortex of 16 patients with OCD and 102 healthy controls. A total of 1,136 mitochondrial-related genes (MRGs) were retrieved from the MitoCarta 3.0 database (https://www.broadinstitute.org/mitocarta/mitocarta30 -inventory-mammalian-mitochondrial-proteins-and-pathways), and 1,548 PCD-related genes (PCD-RGs) were extracted from the literature.

Differentially expressed genes (DEGs) between OCD and control samples were identified using the GSE78104 dataset with the 'limma' package (version 3.56.2), applying thresholds of |log fold-change (FC)| ≥ 0.5 and p ≤ 0.05. Volcano and heatmaps were generated to visualize the DEGs using the 'ggVolcano' (version 0.0.2) and 'ComplexHeatmap' (version 2.16.0) packages, respectively. Differentially expressed MRGs (DE-MRGs) and PCD-RGs (DE-PCD-RGs) were determined by intersecting the DEGs with the 1,136 MRGs and 1,548 PCD-RGs. Spearman's correlation analysis was used to assess the relationship between DE-MRGs and DE-PCD-RGs, with genes meeting the thresholds (p < 0.001, |cor| > 0.6) categorized as DE-MPCD-RGs. Next, WGCNA was applied to the GSE78104 dataset using the 'WGCNA' package (version 1.72.5) to identify key modules most related to OCD. Outlier samples were removed via cluster analysis, and the optimal soft threshold (β) was determined by setting the scale-free R to exceed 0.9, with the average connectivity approaching 0. Using the selected β-value, genes with similar expression profiles were grouped into modules via a dynamic tree-cutting method (minModuleSize = 30, mergeCutHeight = 0.25). Key modules significantly correlated with OCD were selected based on correlation coefficients (p < 0.05, |cor| > 0.3), and genes within these modules were considered key module genes. Candidate genes for OCD were then identified by intersecting the DE-MPCD-RGs with the key module genes using the 'ggvenn' package.

To identify the biological functions and pathways associated with the candidate genes, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses (p.adjust < 0.05) were conducted using the 'clusterProfiler' package (version 4.8.2), with 'org.Hs.eg.db' (version 3.17.0) as the background gene set. Further analysis of the candidate genes was performed using ToppCluster on the 'ToppGene' platform (version 2.3.1) (p < 0.05), focusing on key biological processes in GO and related diseases. The GO-disease network was visualized using 'Cytoscape' (version 3.7.1). Additionally, protein-protein interactions (PPI) among the candidate genes were explored by constructing a network using the Search Tool for Recurring Instances of Neighboring Genes (STRING database, https://string-db.org/) with an interaction score threshold set to 0.4.

In the GSE78104 dataset, the candidate genes were analyzed using the Support Vector Machine Recursive Feature Elimination (SVM-RFE) method via the 'e1071' package (version 1.7.13). This method identified signature genes by eliminating feature vectors generated by the SVM and selecting combinations associated with the lowest error rate. Additionally, univariate logistic regression was performed using the 'rms' package (version 6.7-1) to identify core genes significantly associated with OCD (p < 0.05). Signature and core genes were then intersected to identify potential biomarkers for OCD. Expression levels of these potential biomarkers were validated in both the GSE78104 and GSE60190 datasets, comparing OCD and control groups. Genes showing significant expression and consistent trends across both datasets were selected as OCD biomarkers. Correlations between biomarkers were analyzed using Spearman's correlation.

To further investigate the biological pathways associated with the identified biomarkers in OCD, Gene Set Enrichment Analysis (GSEA) was performed. Using the GSE78104 dataset, correlation coefficients between biomarkers and other genes were calculated via the 'psych' package (version 2.3.12). Genes were ranked according to these correlation coefficients, and GSEA (adj. p-value < 0.05, |NES| > 1) was conducted using the 'clusterProfiler' package (version 4.8.2). The pathway-enriched gene set file was downloaded from the GSEA website (http://www.gsea-CCigdb.org/gsea/CCigdb), with the KEGG pathway gene set 'c2.cp.kegg_ legacy.v2023.2.Hs.entrez.gmt' used as the reference. The top five pathways with the highest NES were displayed using the 'enrichplot' package (version 1.23.1). Additionally, interactions between biomarkers and other functionally similar genes were explored using GeneMANIA (http://genemania.org/), and co-expression networks were constructed to provide further insights into their functional relationships.

To explore immune cell infiltration in OCD, the proportions of 28 immune cell types per sample in the GSE78104 dataset were calculated using the 'GSEABase' package (version 1.62.0) and the 'mmc3.gmt gene set'. A heatmap was generated to visualize immune cell distribution across different samples using the 'pheatmap' package (version 1.0.12). Differences in immune cell proportions between OCD and control samples were assessed using Wilcoxon's test (p < 0.05). Furthermore, correlations between differentially expressed immune cells and biomarkers were examined through Spearman's correlation analysis.

The target miRNAs for the biomarkers were predicted using the miRDB (https://mirdb.org/) and TargetScan (https://www.targetscan.org/vert_80/) databases. Shared miRNAs were identified by intersecting predictions from both databases. Upstream lncRNAs of the predicted miRNAs were analyzed using the miRNet database (https://www.mirnet.ca/).Additionally, upstream transcription factors (TFs) regulating the biomarkers were predicted in the miRNet database. The biomarker-miRNA-lncRNA and biomarker-TF regulatory networks were constructed using 'Cytoscape' software.

As key epigenetic modifications, m6A and m5C, through the expression patterns or interaction relationships of their regulatory factors, may participate in the core mechanism of "mitochondrial dysfunction-PCD imbalance" in OCD by affecting the transcriptional efficiency of PCD-related genes and the expression of mitochondrial function genes. To investigate the role of mA regulators in OCD, 20 known mA regulatory factors (ALKBH5, FMR1, FTO, HNRNPA2B1, HNRNPC, IGFBP3, LRPPRC, METTL14, METTL16, METTL3, RBM15, RBM15B, RBMX, WTAP, YTHDC1, YTHDC2, YTHDF1, YTHDF2, YTHDF3, and ZC3H13) were analyzed for differential expression between OCD and control samples using Wilcoxon's test (p < 0.05) in the GSE78104 dataset. Spearman's correlation analysis was performed to assess the relationships between biomarkers and the 20 mA regulatory factors. Similarly, differences in the expression of 10 mC regulators (NOP2, NSUN3, NSUN4, NSUN6, NSUN7, TRDMT1, TET1, TET2, YBX1, and YTHDF2) between OCD and control samples were evaluated using Wilcoxon's test, followed by correlation analysis with biomarkers using Spearman's correlation.

Diseases strongly correlated with biomarkers were identified through the Disorders-Genes-Environment Network database (DisGeNET,http://www.disgenet.org/web/DisGeNET/menu), with a Gene-Disease Association Score (Score gda) > 0.3. Co-expression networks of biomarkers and diseases were visualized using Cytoscape. Drugs targeting these biomarkers were predicted via the Comparative Toxicogenomics Database (CTD, https://ctdbase.org/), and the biomarker-drug networks were also visualized using Cytoscape.

Ethical approvalfor the study was granted by the Medical Ethics Committee of Fuzhou Neuropsychiatric Prevention and Treatment Hospital, Fujian Province (license number: 202411), and informed consent was obtained from all participants. RNA was extracted from 10 peripheral blood samples stored at -80℃ (5 control,5 OCD) using TRIzol reagent. The RNA was reverse-transcribed into cDNA using the SweScript First Strand cDNA Synthesis Kit (Servicebio). RT-qPCR was performed on a CFX96 real-time quantitative fluorescence PCR instrument, using a reaction system consisting of 2x Universal Blue SYBR Green qPCR Master Mix, primers, and cDNA samples. Primer sequences are provided in Supplementary Table 1. Relative mRNA expression, normalized to GAPDH levels, was calculated using the 2 method.

All statistical analyses were performed using R programming language (version 4.1.3), with Wilcoxon's test applied for comparisons between two groups. Statistical significance was determined with an adjusted p-value or p-value < 0.05.

The analytical workflow of this study is shown in Fig. 1.

Previous articleNext article

POPULAR CATEGORY

corporate

13761

entertainment

17099

research

8093

misc

17777

wellness

13911

athletics

18151