Bioinformatics analysis of TCGA data identifies a taurine metabolism-related subtype classification for predicting prognosis in colon adenocarcinoma
Highlight box
Key findings
• Taurine metabolism is progressively associated with the progression of colon adenocarcinoma (COAD); thus, it could serve as a dependable prognostic indicator.
What is known, and what is new?
• Evaluating the genes associated with taurine metabolism in COAD is worthwhile.
• This research delineated distinct subtypes of COAD and identified biomarkers linked to taurine metabolic processes.
What is the implication, and what should change now?
• Our predictive model for COAD, which is based on the expression of genes associated with taurine metabolism, could aid in the development of novel targeted therapies.
Introduction
Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related deaths worldwide (1). Colon adenocarcinoma (COAD), a type of CRC, poses a major health risk due to its high incidence and mortality rates (1,2). Depending on the disease stage and characteristics, COAD treatment usually includes surgery, chemotherapy, and/or radiotherapy. Despite advances in the screening and treatment of CRC, patient prognosis remains poor, and about 50% of patients experience recurrence and metastasis (2). Clinical staging primarily guides treatment; however, genetic heterogeneity, marked by genomic instability, also affects outcomes and contributes to drug resistance (3). Given the significant genetic diversity of patients and the need for personalized treatments, CRC molecular subtypes need to be closely examined to identify new biomarkers to improve patient prognosis and optimize treatment.
Several prognostic biomarkers and prediction models have been proposed to augment conventional staging systems. For instance, gene expression signatures derived from pathways such as epithelial-mesenchymal transition (EMT), immune microenvironment, and metabolic reprogramming have shown promise in predicting clinical outcomes. Nomograms integrating molecular and clinical variables have also been developed to provide individualized survival estimates. Examples include the model by Zhu et al. (4) for early-onset stage II–III colon cancer and the nomogram by Zheng and Sun (5) predicting perineural invasion risk and its prognostic implications. Despite these advances, the accuracy and generalizability of existing models remain suboptimal, indicating a continued demand for more robust and biologically relevant prognostic biomarkers.
Taurine metabolism has been linked to the development of COAD (6). This sulfur-containing β-amino acid is involved in various cellular functions, such as osmoregulation and antioxidation (6). Recent research suggests that taurine may play a role in CRC, and its elevated levels in CRC patients indicate its potential as a diagnostic biomarker (6). Taurine levels can be used to distinguish between benign and malignant growths, enhancing screening accuracy. Additionally, in a colon cancer rat model, taurine was shown to improve the effectiveness of the chemotherapy drug 5-fluorouracil by reducing side effects and enhancing treatment outcomes (7).
Taurine metabolism is closely linked with other cancer-related metabolic pathways, and thus has diagnostic and therapeutic potential. Research on taurine synthesis in pancreatic cancer has revealed its role in recurrence and survival, and suggests that it may have similar mechanisms in COAD, in which it may affect tumor progression and prognosis (8). Additionally, the relationship between taurine and bile acid metabolism in COAD has led to the development of a prognostic model based on bile acid metabolism-related genes. This model emphasizes the complex metabolic interactions in cancer progression and the possibility of targeting these pathways for treatment (9). Research has examined the role of taurine in influencing sphingolipid metabolism, a key factor in CRC affecting cell survival and growth (10). The impact of taurine on this pathway may reveal new cancer treatment targets. Overall, research has highlighted the importance of taurine in COAD, suggesting that an understanding of its role could lead to innovative diagnostic and therapeutic approaches that enhance patient outcomes (6-8).
This study aimed to identify clinically relevant subtypes of COAD by examining the genes associated with taurine metabolism. Furthermore, it sought to characterize these subtypes based on their clinical prognostic outcomes, tumor microenvironment features, immune cell infiltration patterns, sensitivity to chemotherapy, and underlying functional mechanisms. We present this article in accordance with the TRIPOD reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-605/rc).
Methods
Acquisition and preprocessing of publicly accessible cohort data
The gene sets associated with taurine metabolism were obtained from previous research (11). A total of 454 COAD samples were sourced from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/), which included pertinent transcript per million RNA-sequencing data alongside relevant clinical metadata. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Clustering analysis of COAD based on taurine metabolism-related genes
Using the “non-negative matrix factorization” (“NMF”) package in R, the COAD patients were categorized into groups based on the expression profiles of the genes associated with taurine metabolism. Following the determination of the optimal number of clusters, designated as k, the clustering process was repeated 1,000 times to ensure the establishment of a stable and reliable consensus matrix. The silhouette width values ranged from −1 to 1, such that a higher value approaching 1 indicated enhanced separation and cohesion among the clusters. A principal component analysis (PCA) was conducted to examine the distributional differences among the various subtypes. This analysis was performed using the “limma” package, and the findings were visually represented using the “ggplot2” package.
Identification of differentially expressed genes (DEGs) of the two taurine metabolism-related COAD subtypes
DEGs were identified through the application of the “limma” package in R based on a threshold of a |log2fold change (FC)| greater than 1 and an adjusted P value less than 0.05. Subsequently, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed on the identified DEGs using the “clusterProfiler” package in R to elucidate the molecular mechanisms.
Immune activity between the two taurine metabolism-related groups in COAD
The Tumor IMmune Estimation Resource (TIMER) methodology was employed to assess the immune score by using specific biomarkers that reflect the infiltration of immune cells in tumor specimens. The expression levels of these biomarkers were evaluated to gain more comprehensive insights into the functional dynamics of the immune checkpoints. The following key genes were found to be integral to immune regulation: ITPRIPL1, SIGLEC15, TIGIT, CD274, HAVCR2, PDCD1, CTLA4, LAG3, and PDCD1LG2. Statistical evaluations were performed using R software (version 4.0.3). A P value less than 0.05 was considered statistically significant.
The one-class logistic regression algorithm, as formulated by Malta et al., was applied to derive the messenger RNA expression-based stemness index (12). This algorithm uses features extracted from the gene expression data of 11,774 genes. The RNA expression data underwent processing via Spearman correlation analysis. Subsequently, the dryness index was normalized to a range of [0, 1] through linear transformation, which involved subtracting the minimum value and then dividing by the maximum value. All the aforementioned analytical techniques and R packages were executed using R Foundation for Statistical Computing [2020] (version 4).0.3. A P value <0.05 was considered statistically significant.
Creation of the taurine metabolism-related risk score
Initially, a multivariable Cox regression analysis was performed to narrow down the range of gene candidates for screening, after which, a stepwise iterative analysis was conducted to select the most optimal model, which ultimately served as the final model. The Cox proportional hazards model was employed to identify the genes significantly associated with taurine metabolism. Subsequently, the integrated risk score was established using the regression coefficients obtained from the multivariable Cox regression analysis, focusing on signatures derived from the training dataset. The formula for calculating the risk score is expressed as follows: risk score = Σ coefficient of (i) × expression level of the gene (i), where gene (i) signifies the gene identified in the analysis.
Here, the coefficient of gene (i) refers to the regression coefficient linked to gene (i), while the expression of gene (i) denotes the expression level of each candidate taurine metabolism-related gene (i) for each individual patient. The risk score for each patient was ascertained via the survival R package using the “predict” function. Patients were stratified into high- and low-risk categories based on the median risk score. The performance of the regression model was evaluated using Harrell’s concordance index. Furthermore, the clinical applicability of the taurine metabolism-related genes was assessed through the visualization of risk-score distributions and survival curves in TCGA cohort.
Statistical analysis
The bioinformatics analyses were conducted using R software (version 4.2.1). A two-tailed P value was used, and the results were deemed statistically significant if the P values fell below the threshold of 0.05. To assess survival differences among the groups, Kaplan-Meier curves were generated, and the variations in survival were analyzed using a two-tailed log-rank test. Additionally, the Wilcox test was applied to evaluate the discrepancies between the two groups.
Results
Identification of the prognostic taurine metabolism-related genes in COAD
To investigate the significance of the prognostic genes in COAD, a univariate Cox regression analysis was conducted to identify the genes associated with prognosis in patients from the TCGA-COAD cohort. This analysis identified a total of 597 genes with prognostic relevance in COAD; the top 20 prognostic genes are shown in Figure 1A. Additionally, we further examined the genes associated with taurine metabolism that showed prognostic potential in COAD. The taurine metabolism-related genes identified are presented in Figure 1B and include HSPB1, NOS2, LEP, KPNA2, SERPINA1, NR1H2, ENO2, HSPA1A, TRPV1, GSR, ALOX12, GABRD, TERT, CLCN3, AGMAT, NOTCH3, and MYB.
NMF clustering identified two taurine metabolism-based subtypes
A total of 454 patients were included in the NMF clustering analysis, which was predicated on the comprehensiveness of the clinical data available. Using the expression profiles associated with the taurine metabolism-related genes from TCGA, the NMF algorithm effectively categorized the patients into two distinct expression patterns: cluster 1 (C1), which comprised 335 individuals, and cluster 2 (C2), which comprised 119 individuals. To assess the transcriptional profiles of these two inflammatory subtypes, a PCA was performed (Figure 2A). As Figure 2B shows, the clustering at k=2 exhibited a pronounced delineation between the two subtypes, indicating a substantial level of explanatory power and interpretability of the clustering, characterized by markedly high intra-cluster correlation and notably low inter-cluster correlation. Additionally, a heatmap revealed a distinct separation in the gene expression profiles of the 13 taurine metabolism-related genes between the two groups of patients from the TCGA-COAD cohort (Figure 2C). Notably, the statistically significant difference indicated that the patients in C1 had improved overall survival compared to those in C2 [hazard ratio (HR): 0.465; 95% confidence interval (CI): 0.312–0.692; P<0.001; Figure 2D].
Identification of the underlying mechanisms between the two clusters in COAD
To investigate the distinct mechanisms underlying the two clusters, we identified 199 DEGs based on a significance threshold of P<0.05 and |log2FC| >1. Of these DEGs, 19 were upregulated and 180 were downregulated as depicted in the volcano plot (C1 vs. C2; Figure 3A). The expression patterns of the leading DEGs displayed contrasting trends between the two clusters in the heatmap (Figure 3B).
To further examine the biological processes of the 218 DEGs, GO and KEGG enrichment analyses were conducted. The results of the GO analysis for the biological processes indicated that these DEGs were predominantly involved in extracellular matrix (ECM) organization, extracellular structure organization, ossification, collagen fibril organization, cell-substrate adhesion, regulation of cell-substrate adhesion, collagen metabolic processes, endodermal cell differentiation, and endoderm formation (Figure 3C).
Moreover, the KEGG analysis results highlighted that the DEGs were primarily associated with pathways including ECM-receptor interaction, protein digestion and absorption, focal adhesion, complement and coagulation cascades, phagosome, human papillomavirus infection, and the PI3K-Akt signaling pathway (Figure 3D).
Immune activity between the two taurine metabolism-related groups in COAD
Previous study showed that taurine metabolism is intricately linked to immune function across various cancer types (13). Initially, the relationship between the expression of the taurine metabolism-related genes and immune scores was assessed by Spearman correlation analysis. The genes related to taurine metabolism (i.e., HSPB1, NOS2, LEP, KPNA2, SERPINA1, NR1H2, ENO2, HSPA1A, TRPV1, GSR, ALOX12, GABRD, TERT, CLCN3, AGMAT, NOTCH3, and MYB) were found to be significantly correlated with a variety of immune cell types (Figure 4A).
Subsequently, we evaluated the immune activity between the two distinct taurine metabolism-related clusters in COAD patients. The boxplots revealed a marked difference in the immune cell populations (specifically, CD4+ T cells, neutrophils, macrophages, and myeloid dendritic cells) between C1 and C2.
Moreover, the boxplots also showed that eight of the 10 immune checkpoint inhibitor-related genes (i.e., CTLA4, HAVCR2, IGSF8, LAG3, PDCD1, PDCD1LG2, SIGLEC15, and TIGIT) were expressed at lower levels in C1 than C2 (Figure 4B). These findings provide further evidence of a significant association between taurine metabolism and immune activity.
Comparison of tumor stemness between COAD subtypes
Cancer stem cells (CSCs) play a pivotal role in the processes of tumorigenesis, recurrence, and metastasis, and are key contributors to both resistance to chemotherapy and the re-emergence of cancer. The CSC scores of the patients classified as C1 in the TCGA-COAD cohort were markedly elevated compared to those classified as C2 (Figure 5).
Construction of a prognostic model based on taurine metabolism-related genes
Subsequent analyses using least absolute shrinkage and selection operator (LASSO) and Cox regression were conducted, and 17 genes associated with taurine metabolism in COAD were identified. A signature comprising nine genes was established based on the optimal λ value. The risk score was calculated using the following formula: risk score = (0.2533) × LEP + (−0.1142) × SERPINA1 + (0.2828) × ENO2 + (0.189) × HSPA1A + (−0.4269) × GSR + (0.9964) × GABRD + (0.3864) × TERT + (−0.3476) × NOTCH3 + (−0.2033) × MYB.
Using this gene signature, the patients from the TCGA-COAD cohort were stratified into low- and high-risk groups (Figure 6A). The overall survival analysis indicated that the patients in the low-risk group had superior survival outcomes compared to those in the high-risk group (HR: 3.489; 95% CI: 2.212–5.504; P=7.78e−08; Figure 6B). In addition, the efficacy of the prognostic model was evaluated via a receiver operating characteristic curve analysis, which revealed area under the curve values of 0.698 for 1 year, 0.699 for 3 years, and 0.73 for 5 years (Figure 6C). These findings indicate that the proposed model possesses a commendable prognostic capability.
Discussion
This study identified a total of 597 genes with prognostic relevance in COAD. Among these, a subset of genes associated with taurine metabolism was identified, which included HSPB1, NOS2, LEP, KPNA2, SERPINA1, NR1H2, ENO2, HSPA1A, TRPV1, GSR, ALOX12, GABRD, TERT, CLCN3, AGMAT, NOTCH3, and MYB. Using the expression profiles related to the taurine metabolism genes sourced from TCGA, the NMF algorithm successfully categorized patients into two distinct expression groups: C1, which comprised 335 patients, and C2, which comprised 119 patients. To better understand the mechanisms differentiating these clusters, 199 DEGs were identified. A GO analysis of these DEGs indicated their primary involvement in biological processes, such as the organization of the ECM, extracellular structure organization, ossification, collagen fibril organization, and cell-substrate adhesion. In addition, the KEGG analysis results suggested that the DEGs were predominantly linked to pathways including ECM-receptor interaction, protein digestion and absorption, and focal adhesion. Importantly, differences in immune activity were noted between the two clusters related to taurine metabolism in COAD. The CSC scores of the patients classified as C1 in the TCGA-COAD cohort were significantly elevated compared to those classified as C2. Additional analyses employing LASSO and Cox regression techniques led to the identification of 17 genes associated with taurine metabolism in COAD. Consequently, a prognostic model was established that included nine genes (i.e., LEP, SERPINA1, ENO2, HSPA1A, GSR, GABRD, TERT, NOTCH3, and MYB) to predict patient outcomes in COAD.
Taurine metabolism is crucial in CRC. It influences cancer progression and thus has therapeutic potential. This sulfur-containing amino acid has anti-inflammatory and anti-cancer properties, making it a promising biomarker and treatment target. Taurine can regulate cancer cell growth, apoptosis, and metastasis as shown by its suppression of CRC cell proliferation and metastasis, and the induction of apoptosis through EMT gene regulation and ERK/RSK pathway inhibition. It also counteracts hypotaurine-induced CRC progression, and thus has therapeutic promise. Additionally, its anti-cancer effects were confirmed in an azoxymethane/dextran sulfate sodium-induced mouse model of colon cancer (14). Taurine was shown to significantly inhibit tumor growth in this model, which suggests that it could serve as a chemo-preventive agent against CRC (15). Taurine was also shown to increase apoptosis markers and tumor suppressor proteins, which further supports its role in cancer prevention. A systematic review and meta-analysis found that taurine levels are significantly associated with CRC (6). Such findings highlight its potential as a diagnostic metabolite for distinguishing between benign and malignant growths, and improving screening accuracy (6).
A recent study has shown that SERPINA1, ENO2, and HSPA1A have crucial roles in CRC. SERPINA1, a serine protease inhibitor, is overexpressed in CRC and linked to poor outcomes by enhancing the STAT3 pathway (16). CEBPB binds to SERPINA1’s promoter, boosting its transcription and tumor growth, making SERPINA1 a potential prognostic marker and treatment target (16). Similarly, ENO2, a glycolytic enzyme, is dysregulated in CRC and negatively impacts prognosis (17). It facilitates CRC cell migration and invasion through interaction with the long non-coding RNA CYTOR, affecting LATS1 and YAP1, and inducing EMT; thus, ENO2 could serve as a therapeutic target (17). HSPA1A plays a crucial role in CRC progression as revealed by an analysis of ubiquitination-related genes (18). A gene signature based on these pathways has been shown to effectively predict patient survival by categorizing them into high or low risk (18). Reducing HSPA1A levels significantly decreases CRC cell growth and spread, which suggests that HSPA1A could serve as a target for personalized treatments (18). This research emphasizes the need to understand the molecular mechanisms of CRC, and has identified SERPINA1, ENO2, and HSPA1A as promising targets for the development of targeted therapies to enhance patient outcomes.
There are a number of limitations in this study. The study results stem from a retrospective analysis; thus, validation through prospective studies is essential. The dependence on historical data in our research might have introduced biases that could affect the reproducibility of our findings. While we acknowledge that relying solely on TCGA data introduces certain challenges, we believe that this dataset provides a substantial foundation for our analysis due to its comprehensive nature and the quality of data available. Additionally, subsequent studies should perform functional experiments on these genes to extend the understanding of its involvement in COAD.
Conclusions
The status of taurine metabolism-related genes is closely correlated with tumor classification and immunity in COAD. Our findings could inform the diagnosis and treatment of COAD.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-605/rc
Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-605/prf
Funding: None.
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-605/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Siegel RL, Miller KD, Wagle NS, et al. Cancer statistics, 2023. CA Cancer J Clin 2023;73:17-48. [Crossref] [PubMed]
- Weitz J, Koch M, Debus J, et al. Colorectal cancer. Lancet 2005;365:153-65. [Crossref] [PubMed]
- Kawakami H, Zaanan A, Sinicrope FA. Microsatellite instability testing and its role in the management of colorectal cancer. Curr Treat Options Oncol 2015;16:30. [Crossref] [PubMed]
- Zhu S, Xing Y, Tu J, et al. Development and validation of predictive nomograms for survival in early-onset colon cancer patients with II-III stage across various tumor sites. Transl Cancer Res 2025;14:2233-49. [Crossref] [PubMed]
- Zheng Z, Sun X. Development of a nomogram predicting perineural invasion risk and assessment of the prognostic value of perineural invasion in colon cancer: a population study based on the Surveillance, Epidemiology, and End Results database. Transl Cancer Res 2025;14:141-58. [Crossref] [PubMed]
- Sinha A, Griffith L, Acharjee A. Systematic Review and Meta-Analysis: Taurine and Its Association With Colorectal Carcinoma. Cancer Med 2024;13:e70424. [Crossref] [PubMed]
- Jornada DH, Boreski D, Chiba DE, et al. Synergistic Enhancement of 5-Fluorouracil Chemotherapeutic Efficacy by Taurine in Colon Cancer Rat Model. Nutrients 2024;16:3047. [Crossref] [PubMed]
- Nam H, Lee W, Lee YJ, et al. Taurine Synthesis by 2-Aminoethanethiol Dioxygenase as a Vulnerable Metabolic Alteration in Pancreatic Cancer. Biomol Ther (Seoul) 2025;33:143-54. [Crossref] [PubMed]
- Luo Q, Zhou P, Chang S, et al. Construction and validation of a prognostic model for colon adenocarcinoma based on bile acid metabolism-related genes. Sci Rep 2023;13:12728. [Crossref] [PubMed]
- Machala M, Procházková J, Hofmanová J, et al. Colon Cancer and Perturbations of the Sphingolipid Metabolism. Int J Mol Sci 2019;20:6051. [Crossref] [PubMed]
- Cao S, Lun S, Duan L, et al. Harnessing Calmodulin-Related Genes to Build a Prognostic Model in Esophageal Squamous Cell Carcinoma for a Comprehensive Analysis of Single-Cell Immune Characteristics and Drug Efficacy. J Immunother 2025;48:244-57. [Crossref] [PubMed]
- Malta TM, Sokolov A, Gentles AJ, et al. Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation. Cell 2018;173:338-354.e15. [Crossref] [PubMed]
- Qin Z, Huang G, Xu J, et al. Multidimensional transcriptomics based to illuminate the mechanisms of taurine metabolism in immune resistance of pancreatic cancer. Front Immunol 2025;16:1567805. [Crossref] [PubMed]
- Hou X, Hu J, Zhao X, et al. Taurine Attenuates the Hypotaurine-Induced Progression of CRC via ERK/RSK Signaling. Front Cell Dev Biol 2021;9:631163. [Crossref] [PubMed]
- Wang G, Ma N, He F, et al. Taurine Attenuates Carcinogenicity in Ulcerative Colitis-Colorectal Cancer Mouse Model. Oxid Med Cell Longev 2020;2020:7935917. [Crossref] [PubMed]
- Ma Y, Chen Y, Zhan L, et al. CEBPB-mediated upregulation of SERPINA1 promotes colorectal cancer progression by enhancing STAT3 signaling. Cell Death Discov 2024;10:219. [Crossref] [PubMed]
- Lv C, Yu H, Wang K, et al. ENO2 Promotes Colorectal Cancer Metastasis by Interacting with the LncRNA CYTOR and Activating YAP1-Induced EMT. Cells 2022;11:2363. [Crossref] [PubMed]
- Gao X, Yan T, Yu X, et al. Integrative analysis of ubiquitination-related genes identifies HSPA1A as a critical regulator in colorectal cancer progression. Med Oncol 2025;42:123. [Crossref] [PubMed]
(English Language Editor: L. Huleatt)

