Molecular subtyping and prognostic model construction based on endosome-related genes in colorectal cancer
Original Article

Molecular subtyping and prognostic model construction based on endosome-related genes in colorectal cancer

Xiao Huang, Jie Yang

Colorectal Cancer Center, West China Hospital of Sichuan University, Chengdu, China

Contributions: (I) Conception and design: Both authors; (II) Administrative support: X Huang; (III) Provision of study materials or patients: J Yang; (IV) Collection and assembly of data: Both authors; (V) Data analysis and interpretation: Both authors; (VI) Manuscript writing: Both authors; (VII) Final approval of manuscript: Both authors.

Correspondence to: Jie Yang, MM. Colorectal Cancer Center, West China Hospital of Sichuan University, 37 Guoxue Lane, Wuhou District, Chengdu 610041, China. Email: jieyang_wchsp_scu@163.com.

Background: The endosome plays a crucial role in tumor cell material transport, signal regulation, and tumor immune microenvironment modeling, but the molecular characteristics, subtyping significance, and prognostic value of endosome-related genes (ERGs) in colorectal cancer (CRC) have not been systematically elucidated. This study aims to analyze the molecular heterogeneity of CRC from the perspective of ERGs and evaluate ERGs’ clinical and therapeutic significance.

Methods: The transcriptome and clinical data of CRC from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases were integrated, identifying 63 CRC-related ERGs. Molecular typing of CRC samples was conducted based on consensus clustering analysis, and comparison of differences in survival outcomes, tumor immune microenvironment characteristics, and potential response to immunotherapy was performed among different subtypes. An ERG prognostic risk model was constructed through differential expression analysis and was validated in an independent cohort. Combining functional experiments and drug sensitivity analysis, the potential mechanisms and therapeutic value of key genes were investigated.

Results: The CRC samples were divided into two significantly different subtypes of endosome-related molecules (cluster 1 and cluster 2). Compared with cluster 2, cluster 1 patients had better overall survival rates, more active immune cell infiltration (such as B cell and mast cell enrichment), and lower Tumor Immune Dysfunction and Exclusion (TIDE) scores, suggesting that they may be more likely to benefit from immune checkpoint inhibitor therapy. The prognostic model constructed based on 8-ERGs showed good predictive performance in both the training and validation sets. Further analysis revealed that the key prognostic gene MAGEA1 was significantly upregulated in CRC tissues and closely associated with poor pathological staging and shorter overall survival time. Mechanism analysis suggested that MAGEA1 may promote CRC cell proliferation by regulating the CDK4/6-Rb signaling axis, and the CDK4/6 inhibitor ribociclib (LEE011) exhibited stronger anti-proliferative effects on MAGEA1 overexpressing cells. In addition, the constructed integrated risk score and clinical parameter nomogram can be used for personalized survival prediction.

Conclusions: This study systematically revealed the molecular heterogeneity of CRC from the perspective of ERGs, identified clinically significant prognostic subtypes, and key gene MAGEA1. The constructed ERG prognostic model and nomogram provide new theoretical basis and potential targets for risk stratification and personalized treatment of CRC patients.

Keywords: Colorectal cancer (CRC); endosome; prognostic model; tumor immune microenvironment; drug sensitivity


Submitted Jul 25, 2025. Accepted for publication Dec 24, 2025. Published online Feb 26, 2026.

doi: 10.21037/jgo-2025-593


Highlight box

Key findings

• The expression characteristics of endosome-related genes (ERGs) in colorectal cancer (CRC) are analyzed and two molecular subtypes with significant clinical and biological differences are identified.

• There are significant differences in survival outcomes, tumor immune microenvironment, and potential response to immunotherapy among different subtypes of ERGs.

• A robust prognostic model consisting of 8 ERGs is constructed and validated, which performs well in predicting overall survival.

MAGEA1 is a key prognostic gene, and its high expression is associated with advanced staging and poor prognosis.

• Mechanism analysis suggests that MAGEA1 may promote CRC cell proliferation through the CDK4/6-Rb signaling pathway.

What is known and what is new?

• Endosome is involved in tumor progression and immune regulation, but the clinical value of related genes in CRC is not yet clear.

• This study established for the first time the ERG-related molecular typing in CRC, revealed their systematic association with immune landscape, immunotherapy response, and prognosis, and identified MAGEA1 as a potential therapeutic target.

What is the implication, and what should change now?

• This study emphasizes the clinical significance of ERGs in CRC risk stratification and individualized prognostic assessment. The proposed ERG prognostic model and MAGEA1 are expected to support more accurate patient stratification and provide a reference for immunotherapy and CDK4/6-targeted therapy strategies.


Introduction

Colorectal cancer (CRC), one of the most prevalent malignancies of the digestive system worldwide, ranks among the top cancers in both incidence and mortality rates, with its incidence continuing to rise (1). According to the most recent 2024 global cancer burden estimates, there were approximately 1,926,118 newly diagnosed CRC cases and 903,859 CRC deaths in 2022 (2). Based on clinical and histopathological characteristics, CRC can be classified into multiple subtypes with significant heterogeneity in biological behavior, treatment response, and prognosis (3). Although comprehensive treatment strategies—including surgical resection, chemoradiotherapy, and targeted therapy—have been established, many patients still face challenges such as postoperative recurrence, metastasis, and therapeutic resistance, resulting in limited improvement in long-term survival rates (4,5). Currently, CRC diagnosis primarily relies on colonoscopy and imaging techniques such as positron emission tomography (PET), magnetic resonance imaging (MRI), and computed tomography (CT). However, these methods have limitations, including operator-dependent variability and reduced sensitivity in detecting early-stage microlesions and heterogeneous tumor regions (6-8). At the molecular level, microsatellite instability status, KRAS/NRAS/BRAF mutations and the CpG island methylator phenotype have been utilized to guide diagnosis and treatment decisions (9-11). Nevertheless, these biomarkers are often restricted to specific molecular pathways or patient subgroups, failing to fully capture CRC’s heterogeneous nature. Their clinical utility in personalized prognostic prediction remains suboptimal. Thus, establishing a systematic molecular typing framework and precision prognostic models, along with elucidating the molecular mechanisms underlying distinct CRC subtypes, holds significant scientific and clinical value for improving early diagnosis and optimizing personalized therapeutic strategies.

In recent years, it has become increasingly recognized that the endosomal system plays a central regulatory role in the occurrence and development of tumors. As the hub of the cell membrane transport network, the endosome not only serves as the downstream node of endocytosis but also undertakes multiple cellular functions such as receptor sorting, signal regulation, immune response modulation, nutrient metabolism, and drug delivery (12-14). Abnormal endosomal function can determine the destination (recycling or degradation) of key signal receptors such as receptor tyrosine kinases (RTKs), thereby affecting cell fate, proliferation, and survival (15,16). Moreover, as a key organelle regulating the intensity, duration, and spatial distribution of RTK signals, the endosome plays a central role in the occurrence of tumor cell migration and metastasis, and thus becomes a potential anti-metastasis therapeutic target (17). Abnormal endosomal function can lead to continuous receptor signal transduction, which is related to cancer progression (18). Studies have revealed that some endosome-related molecules, such as the RAB family and EGFR, have potential prognostic or therapeutic targets in solid tumors such as CRC, breast cancer (BC), and lung cancer (19-21). In low-grade gliomas, researchers have developed a prognostic model for endosome-related genes (ERGs), which can be used as an independent prognostic indicator for low-grade gliomas (15). It is worth noting that recent studies have shown that the endosomal/membrane transport pathway is closely related to the key driver signal Wnt/β-catenin in CRC. The classical Wnt signal not only regulates transcriptional target genes but also directly drives macropinocytosis, lysosome activation, and multivesicular bodies (MVB) formation, promoting extracellular protein uptake and degradation, and maintaining signal homeostasis (22,23). Close labeling techniques have shown that after Wnt stimulation, the LRP6 receptor can rapidly recruit the ESCRT machinery, suggesting that the endosome/ESCRT is an important component of the Wnt signal complex (24). At the same time, targeted therapeutic strategies for Wnt-dependent CRC membrane transport have been proposed, demonstrating their potential as new therapeutic targets (25). These studies provide a theoretical basis for systematically evaluating the function of ERGs in CRC. Although the function of the endosomal system is highly involved in various solid tumors, the expression profile, molecular subtype characteristics, and their associations with patient survival, immune status, and drug response of ERGs in CRC remain unclear. This study aims to analyze the overall role of the endosomal system in CRC. We constructed an ERG set covering core functional molecules of the endosome, which involves multiple membrane transport processes mediated by the endosome, including but not limited to endocytosis, recycling, and secretion pathways.

This study integrated messenger RNA (mRNA) expression data of CRC from public databases with ERG sets to identify distinct CRC endosome subtypes through consensus clustering analysis. Building upon this classification, we comprehensively characterized subtype-specific TME features, differential drug sensitivities, and prognostic biological behaviors. Furthermore, we developed and validated an 8-gene prognostic risk assessment model. Our findings not only systematically elucidate the biological significance of ERGs in CRC molecular subtyping and prognosis prediction but also provide crucial theoretical foundations and practical directions for advancing molecular subtype-based precision therapy strategies and individualized clinical management. The detailed technical roadmap can be found in Figure S1. We present this article in accordance with the TRIPOD and MDAR reporting checklists (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-593/rc).


Methods

Data sources and acquisition of ERGs

CRC data were obtained from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/) and Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). Among them, the TCGA data was defined as the training set, while the GEO datasets GSE17537 and GSE39582 were defined as the validation sets. The training set contained 650 CRC samples and 51 normal control samples, and the validation set contained 640 CRC samples (GSE17537: 55 samples; GSE39582: 585 samples. To ensure data quality, after excluding samples with survival time less than 30 days, a total of 608 samples were finally included in the training set and 627 samples in the validation set (GSE17537: 54 samples; GSE39582: 573 samples). The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

The ERG set was sourced from the GeneCards database (https://www.genecards.org/). Following established methodology (15), we searched GeneCards using the keyword “endosome” and selected genes with Relevance Scores more than 10, yielding a preliminary list of ERGs (online Table I: https://cdn.amegroups.cn/static/public/10.21037jgo-2025-593-1.xlsx). The intersection of ERGs with the training set resulted in 63 CRC-associated ERGs (online Table II: https://cdn.amegroups.cn/static/public/10.21037jgo-2025-593-2.xlsx).

Molecular subtypes of CRC based on consensus clustering of ERGs

To investigate the survival impact of the 63 identified ERGs on CRC, we first performed univariable Cox regression analysis and selected genes with P values <0.05 for subsequent analyses. We then evaluated the classification performance of these survival-associated ERGs through cluster analysis. Specifically, using the “ConsensusPlus” package of the R software (26), consensus clustering analysis was conducted on the training set samples based on the expression levels of survival-related ERGs. During the clustering process, the Euclidean distance was used as the distance metric between samples, and 1,000 resamplings (resampling =80%) were performed to ensure the stability of the clustering results. At the same time, the random seed was set to 111 to ensure the reproducibility of the analysis. Finally, the optimal number of clusters was determined through the cumulative distribution function (CDF) curve and its variation (ΔCDF).

Survival analysis and immune characterization of CRC patients in different molecular subtypes

To investigate the prognostic differences among patients with different clustering subtypes of CRC, we plotted Kaplan-Meier (K-M) survival curves to evaluate the survival status among each subtype. To further systematically compare the relationship between the molecular classification based on ERG and the consensus molecular subtypes (CMS) of CRC, we used the CMScaller R software package (27) to classify the TCGA CRC dataset into CMS categories, retaining only the samples with a classification false discovery rate (FDR) <0.05 for subsequent analysis. By drawing a Sankey diagram, we visually presented the sample distribution and corresponding relationship between ERG subtypes and CMS categories (CMS1–4). On this basis, we integrated the ERG subtypes with the CMS classification results and summarized them according to ERG cluster grouping and CMS categories (divided into CMS4 group and the combined group of CMS1–3). Then, we conducted a survival analysis. By plotting K-M curves and calculating the pairwise comparison P values for the survival differences between groups, we evaluated the predictive ability of the integrated subtypes for patient prognosis.

Fisher’s exact tests were employed to analyze the distribution differences of clinicopathological characteristics (including T stage and N stage) among subtypes. Furthermore, we performed single-sample gene set enrichment analysis (ssGSEA) to quantify the relative abundance of 23 immune cell types and immune-related functions in each sample, thereby characterizing differences in the TME across subtypes. Subsequently, the Wilcoxon rank sum test was used to compare the expression differences of key immune checkpoint genes and HLA genes among the various subtypes, systematically characterizing the subtype-specific immune features. Additionally, we evaluated the potential response to immunotherapy using the Tumor Immune Dysfunction and Exclusion (TIDE) score and compared the proportions of immunotherapy responders versus non-responders among subtypes through Fisher’s exact tests, providing a multidimensional assessment of immunotherapeutic sensitivity across CRC subtypes.

Drug sensitivity prediction analysis of distinct CRC molecular subtypes

To evaluate potential differences in therapeutic responses to common anticancer drugs among CRC molecular subtypes, we developed a drug sensitivity prediction model using large-scale pharmacogenomic data from the Genomics of Drug Sensitivity in Cancer (GDSC) database (https://www.cancerrxgene.org/). The “calcPhenotype” function in the “oncoPredict” R package (28) was employed, combined with a ridge regression model. By inputting transcriptomic profiles of CRC samples, the model estimated half-maximal inhibitory concentration (IC50) values for each sample across multiple clinically relevant agents. Subtype-specific drug sensitivity patterns were then assessed by comparing different subtype IC50 distributions using Wilcoxon rank-sum tests. Notably, the GDSC data do not come from the CRC samples of this study, but rather from the drug sensitivity data of the pharmacological model (cancer cell lines). Therefore, the IC50 values obtained in this study are calculated based on the transcriptional expression profile of CRC and are used to reflect potential differences in drug responses, rather than the actual measured drug sensitivity results from the experiments.

Development and validation of a prognostic model based on differentially expressed genes (DEGs) in CRC

To identify key DEGs between molecular subtypes, we performed differential expression analysis using the “edgeR” R package (29) with thresholds set at |logFC| ≥0.585 and FDR <0.05. Subsequently, univariable Cox regression analysis was conducted on these DEGs in the training cohort using the “survival” R package (30) to identify prognosis-associated genes strongly correlated with overall survival (OS) (P<0.05). To further refine the gene set, we performed least absolute shrinkage and selection operator (LASSO) Cox regression analysis by integrating the “glmnet” (31) and “survival” R packages to select genes with stable prognostic value. The final prognostic risk score model was generated through multivariable Cox regression analysis based on the selected gene set via the “survival” R package. Model performance was evaluated by plotting time-dependent receiver operating characteristic (ROC) curves for 1-, 3-, and 5-year survival using both the “timeROC” (32) and “survival” R packages in both the training cohort and validation dataset (GSE17537).

The risk score was calculated using the following mathematical formula:

Riskscore=i=1n(βi×Xi)

Where n, Xi, and βi represent the total number of genes, FPKM (fragments per kilobase of transcript per million mapped reads) values, and regression coefficients, respectively.

To further evaluate the model’s stratification capability and prognostic performance, patients were dichotomized into high-risk and low-risk groups according to the median risk score derived from the training cohort, with consistent risk stratification applied to the validation set (GSE17537). Inter-group differences in OS were examined via K-M analysis, with statistical significance determined by the log-rank test.

Comprehensive prognostic analysis and nomogram construction based on clinical features

To investigate the association between the prognostic model and clinical characteristics, Fisher’s exact tests were performed to analyze the distribution differences of clinical variables (T stage, N stage, M stage, pathological stage) across risk groups, with bar plots visualizing their proportional distribution. Subsequently, univariable and multivariable Cox regression analyses were conducted using the “survival” and “forestplot” R packages (33) to evaluate risk scores, age, pathological stage, and other clinical parameters. It identified independent prognostic factors presented as forest plots with hazard ratios and confidence intervals. To enhance clinical utility, a nomogram was developed via the “survival” and “rms” R package (34), integrating significant prognostic factors to predict 1-, 3-, and 5-year survival probabilities. Calibration curves were generated to evaluate the concordance between predicted and observed outcomes and to validate model accuracy.

Immune infiltration characteristics and genomic mutation profiling under risk stratification

To evaluate TME differences between risk groups, we calculated tumor purity, stromal scores, ESTIMATE scores and immune scores using the ESTIMATE algorithm to quantify immune/stromal infiltration levels and compare TME characteristics of these two groups. Additionally, based on single-nucleotide variation (SNV) data from the training cohort, we performed mutation analysis for high-risk and low-risk groups separately using the “maftools” R package (35), generating mutation annotation format (MAF) plots to display mutational landscapes. We focused on analyzing differences between the two groups in terms of mutation types, SNV class, and overall mutation frequencies. Tumor mutational burden (TMB) was estimated using the tmb function from the “maftools” (version 2.20.0) package to compare differences between the two groups. Furthermore, we identified the top 20 most frequently mutated genes and assessed their distributional differences between high-risk and low-risk groups. Additionally, we compared the mutational profiles of key genes of the risk model across different risk groups.

Gene set variation analysis (GSVA) reveals functional pathway differences between risk groups

To further investigate potential molecular and biological process differences between high- and low-risk groups, we performed GSVA to calculate pathway activity scores for each sample. Using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Gene Ontology Biological Process (GOBP) gene sets as references, we computed enrichment scores for relevant pathways and biological processes. Differential analysis of GSVA scores between groups was conducted using the “limma” R package (36), with pathways meeting P value <0.05 considered statistically significant. The top 10 upregulated and downregulated pathways with the least P values were selected for visualization.

Prediction of sensitive drugs for model genes and molecular docking

To further explore the potential therapeutic drugs for the key genes in the model, we adopted the same strategy as in Drug sensitivity prediction analysis of distinct CRC molecular subtypes section, based on the drug sensitivity data from the GDSC database, and utilized the “calcPhenotype” function in the “oncoPredict” R package, combined with the ridge regression model to predict the IC50 of various anti-cancer drugs for CRC samples. Subsequently, we calculated the Spearman correlation coefficient between the expression levels of model genes and the predicted IC50 of each drug to screen for sensitive drugs (i.e., negative correlation drugs) that were significantly associated with the high expression of the genes.

To verify the reliability of the prediction results, we further selected key molecules from the candidate drugs for molecular docking analysis. The three-dimensional structure of the target protein was obtained from the Protein Data Bank (PDB) database, and the corresponding drug SDF file was downloaded from PubChem. AutoDockTools (v1.5.7) was used for receptor and ligand preprocessing, and then molecular docking calculations were performed in AutoDock Vina (v1.2.3) to obtain the binding affinity and evaluate the stability of the drug binding to the target protein.

Cell culture and transfection

The human normal intestinal epithelial cell line NCM460 (IM-H445) and CRC cell line HT29 (IM-H102) were all purchased from IMMOCELL (Shanghai, China). The human CRC cell line HCT116 (H1-0401) was purchased from Sangon Biotech (OriCell, Guangzhou, China). NCM460 cells were cultured in a complete medium consisting of 90% DMEM (11965092, Gibco, Grand Island, NY, USA), 10% fetal bovine serum (16600082, Gibco), and 1% penicillin-streptomycin (PS, C0222, Beyotime, Shanghai, China). HCT116 cells were cultured in a DMEM complete medium containing 10% FBS (16600082, Gibco) and 1% PS. HT29 cells were cultured in McCoy’s 5A medium (IMC-307, Gibco), supplemented with 10% FBS and 1% PS. All cell lines were maintained in a constant temperature incubator at 37 ℃ with 5% CO2.

The si-MAGEA1 (RiboBio, Guangzhou, China) and negative control (si-NC) (RiboBio) were transfected into HCT116 cells using the Lipofectamine 2000 kit (11668019, Thermo Fisher Scientific, Waltham, MA, USA), and the transfection was performed for 24 hours for subsequent experiments. Some transfected cells were seeded at a density of 5×103 per well in a 96-well plate and left overnight, and then treated with LEE011 or an equal volume of DMSO as a control.

Reverse transcription quantitative polymerase chain reaction (RT-qPCR)

Total RNA was extracted from cells using Trizol (15596018CN, Thermo Fisher Scientific), and the concentration and purity of RNA were detected using a NanoDrop spectrophotometer. Using HiScript III RT SuperMix (R323-01, Vazyme, Nanjing, China), RNA was reverse-transcribed into cDNA. Finally, RT-qPCR amplification reactions were performed using the SYBR Green real-time fluorescence quantitative PCR kit (CL132-01, Vazyme) on the Step-One Plus real-time fluorescence quantitative PCR system. The primer sequences are shown in Table 1.

Table 1

Reverse transcription quantitative polymerase chain reaction primers

Gene Forward primer (5'-3') Reverse primer (5'-3')
MAGEA1 GAGTCCTTGTTCCGAGCAGTAA GCAGGCCATCATAGGAGAGAC
GAPDH AATGGGCAGCCGTTAGGAAA GCCCAATACGACCAAATCAGAG

Cloning assay

HCT116 cells in the logarithmic growth phase were collected from each group, digested with trypsin and resuspended into single-cell suspensions. After cell counting, they were inoculated at a density of 500 cells per well into 6-well plates, with 2 mL of complete medium added to each well. The cells were placed in a 37 ℃, 5% CO2 incubator and cultured statically. The fresh medium was replaced every 3 days. When visible cell clones were observed, the medium was discarded. Clones were gently washed twice with phosphate-buffered saline (PBS), fixed for 15 minutes at 4% PFA, and stained with 0.1% crystal violet for 20 minutes. The excess stain was rinsed off with running water. After the air dry at room temperature, the clones with more than 50 cells were counted.

Cell cycle detection

A flow cytometer was used to detect the DNA content of each group of HCT116 cells and calculate the proportions of G0/G1, S, and G2/M. Specifically, HCT-116 cells were inoculated at a density of 1×106 per well in a 6 cm plate overnight. The cells were treated with LEE011. After 24 hours of incubation, cells were centrifuged to collect floating and adherent cells, washed twice with frozen PBS at 4 ℃, and fixed at 4 ℃ for 15 minutes with 75% frozen ethanol. Cells were centrifuged to remove the ethanol and the cell particles were resuspended at a density of 100 µg/mL in PBS containing RNase. Then, the samples were stained with 40 µg/mL propidium iodide in PBS at 37 ℃ for 30 minutes and were analyzed with a FACScan flow cytometer (BD Biosciences, San Jose, CA, USA).

Western blot (WB)

Cells were lysed using RIPA lysis buffer (Beyotime, P0013B) and total proteins were extracted. The protein concentration was determined using the Pierce™ BCA protein assay kit (Thermo Fisher Scientific). The proteins were separated by SDS-PAGE gel electrophoresis and transferred to a PVDF membrane (Millipore, IPVH00010, Burlington, MA, USA). After membrane transfer, the PVDF membrane was blocked in a 5% skim milk solution for 1 hour. Then, the membrane was incubated with specific primary antibodies at 4 ℃ overnight. The membrane was washed 3 times with TBST for 10 minutes each time, and then incubated with the corresponding secondary antibody at room temperature for 1 hour. Finally, the membrane was washed thoroughly with TBST and developed using the enhanced chemiluminescence (ECL) substrate (Millipore) on a chemiluminescence imaging system (Tanon, Shanghai, China), and images were captured. Antibodies used in this study were obtained from Proteintech (Wuhan, China), Cell Signaling Technology (CST; Danvers, MA, USA), Thermo Fisher Scientific, Abcam (Cambridge, UK). Antibody information for WB is shown in Table 2.

Table 2

Antibody information

Antibody type Antibody Host Company Part number Dilution factor
Primary antibody MAGEA1 Rabbit Proteintech 12233-1-AP 1:600
p-Rb (Ser807/811) Rabbit Cell Signaling Technology (CST) 9308 1:1,000
total Rb (Retinoblastoma protein, Rb1) Mouse Thermo Fisher Scientific MA5-11387 1–2 µg/mL
Cyclin B1 Rabbit Cell Signaling Technology (CST) 4138 1:1,000
p-Cyclin B1 (Ser147) Rabbit Cell Signaling Technology (CST) 4131 1:1,000
CDC2 Rabbit Cell Signaling Technology (CST) 9112 1:1,000
p-CDC2 (Tyr15) Rabbit Cell Signaling Technology (CST) 9111 1:1,000
GAPDH Mouse Abcam Ab8245 2 mg/mL
Secondary antibody Goat anti-Rabbit IgG (HRP) Goat Abcam Ab6721 2 mg/mL
Goat anti-Mouse IgG (HRP) Goat Abcam Ab6789 2 mg/mL

Statistical analysis

This study employed R software (version 4.1.1) for bioinformatics analysis. Experimental data analysis was assisted by Microsoft Excel and GraphPad Prism 10.1.2. Continuous variables were expressed as mean ± standard error of the mean (SEM). Comparisons between two groups were conducted using Student’s t-test or Wilcoxon rank sum test based on the distribution characteristics of the data. Multiple group comparisons were performed using one-way analysis of variance (ANOVA) followed by Dunnett’s multiple comparison correction. All statistical tests were two-sided with a significance level of P<0.05.


Results

Identification of molecular subtypes and analysis of immune characteristics based on survival-associated ERGs

Through univariate Cox regression analysis of 63 ERGs, three genes significantly associated with OS of CRC patients were identified: RBSN (rabenosyn-5, a key protein in early endosomal transport), NSG1 (neuronal vesicle trafficking associated 1, involved in endosome-dependent recycling of neurons), and PDCD6IP (programmed cell death 6-interacting protein, a key protein in the ESCRT-III-mediated membrane budding and secretion pathway) (P<0.05). Subsequently, consensus clustering analysis was performed on the expression profiles of these three survival-related ERGs in the training set of 608 samples, and the stability of clustering was evaluated based on 1,000 bootstrap resamplings (80% of the samples each time). Based on the CDF curve, consensus matrix heatmap, and clustering consistency index, the optimal clustering number was determined as k=2, resulting in two molecular subtypes: cluster 1 (n=324) and cluster 2 (n=284) (Figure 1A). Furthermore, to quantitatively evaluate the reproducibility and inter-cluster separation of clustering, we calculated the silhouette width when k=2. The results showed that the average silhouette width of 608 samples reached 0.988, very close to the theoretical maximum of 1, indicating a high degree of separation and internal consistency between the two subtypes (Figure S2A). Additionally, the CDF curve showed the steepest growth trend at K=2, indicating that the samples were highly concentrated on the consensus index, further supporting the robustness and reproducibility of the clustering result (Figure S2B). The K-M survival analysis results showed that the OS of cluster 1 was significantly better than that of cluster 2 (P<0.05) (Figure 1B).

Figure 1 Identification of molecular subtypes and analysis of immune characteristics based on survival-associated ERGs. (A) Unsupervised clustering results. (B) Survival analysis between the two subtypes. (C) Distribution of clinical characteristics (age, gender, T, M, N, tumor stage) across subtypes. (D) Immune infiltration comparison between the two subtypes using ssGSEA. (E) Differential expression of HLA genes between the two subtypes. (F) Comparison of the immune checkpoint gene expression profiles in the two subtypes. (G) TIDE scores and predicted immunotherapy response rates of the two subtypes. ns, P>0.05; *, P<0.05; **, P<0.01; ***, P<0.001; ****, P<0.0001. ERGs, endosome-related genes; ssGSEA, single-sample gene set enrichment analysis; TIDE, Tumor Immune Dysfunction and Exclusion.

To systematically compare the relationship between ERG-based molecular typing and CMS of CRC, we conducted a systematic comparison of ERG subtypes and CMS classifications. The results of the Sankey diagram showed that both ERG subtypes included all CMS categories, but their distributions had significant preferences: cluster 1 was dominated by CMS2 and had the least CMS1, while cluster 2 was dominated by CMS4 and had the least CMS3 (Figure S2C). Further integrated analysis revealed that within cluster 1, patients belonging to the CMS1–3 subtypes had significantly worse survival prognosis than CMS4 patients; however, within cluster 2, the survival differences among CMS subtypes were not significant (Figure S2D). This result indicates that the prognostic significance of CMS subtypes may be regulated by their respective ERG molecular background.

Furthermore, the distribution of clinical characteristics (age, gender, T, M, N, and tumor stage) between the two subtypes demonstrated that cluster 2, which had poorer survival, contained a significantly higher proportion of patients with advanced-stage (Stage II–IV) CRC (P<0.05) (Figure 1C), suggesting that this subtype may be associated with a more progressive disease state. Further ssGSEA analysis indicated significant differences in TME between the two subtypes (P<0.05) (Figure 1D). Specifically, cluster 1 exhibited higher enrichment scores in Mast cells, B cells, immature dendritic cells (iDCs), T-cell co-stimulation, and T-helper cells. In contrast, cluster 2 showed higher scores in activated dendritic cells (aDCs), APC co-stimulation and APC co-inhibition, inflammation-promoting, macrophages, parainflammation, follicular helper T cells (Tfh), and type I IFN response (P<0.05) (Figure 1D). Additionally, comparative analysis of immune checkpoint and HLA expression levels revealed that CD27, CD28, CD40LG, ICOS, and TMIGD2, as well as HLA-G, were more highly expressed in cluster 1, whereas HLA-DMA and HLA-E were more highly expressed in cluster 2 (P<0.05) (Figure 1E,1F). Moreover, assessment of immunotherapy sensitivity demonstrated that cluster 1 had lower TIDE scores and a higher proportion of predicted responders to immunotherapy (P<0.05) (Figure 1G), suggesting that cluster 1 may have greater potential to benefit from immunotherapy.

Analysis of potential drug sensitivity differences between molecular subtypes

To evaluate potential differences in therapeutic response between molecular subtypes, we compared the drug sensitivity of cluster 1 and cluster 2 based on computational pharmacogenomic analysis. The results demonstrated that cluster 1 exhibited significantly higher sensitivity (i.e., lower predicted IC50 values) to the following compounds: GSK269962A, navitoclax, PF4708671, TAF15496, zoledronate, sorafenib, ribociclib, and AT13148. Conversely, cluster 2 showed greater sensitivity to XAV939, AZD5582, ULK14989, and obatoclax mesylate (P<0.05) (Figure 2).

Figure 2 Analysis of potential drug sensitivity differences between molecular subtypes. *, P<0.05; **, P<0.01; ***, P<0.001; ****, P<0.0001.

Construction and validation of an 8-gene prognostic model based on key DEGs

We established a robust prognostic signature by first performing differential expression analysis between different molecular subtypes, identifying 884 DEGs (|logFC| ≥0.585, FDR <0.05) (Figure 3A). Subsequent univariate Cox regression analysis of these DEGs in the training cohort revealed 121 genes significantly associated with OS (P<0.05) (online Table III: https://cdn.amegroups.cn/static/public/10.21037jgo-2025-593-3.xlsx). To refine the candidate gene set and enhance model stability, we applied LASSO Cox regression analysis, which selected 13 key genes with potential prognostic value (Table S1, Figure 3B). Further multivariate Cox regression analysis constructed a final 8-gene prognostic risk model comprising RBP7, KRT84, MAGEA1, PCOLCE2, ALPP, FOXD1, CXCL13, and MMP1 (Figure 3C). ROC curve analysis demonstrated the model’s consistent prognostic accuracy in both the training cohort and validation cohort (GSE17537 and GSE39582), with area under the curve (AUC) values exceeding 0.6 (Figure 3D-3F).

Figure 3 Construction and preliminary validation of the risk score model. (A) Volcano plot of DEGs between molecular subtypes. (B) LASSO Cox regression analysis in the training cohort, showing the coefficient distribution plot generated for the logarithmic (λ) sequence and the LASSO coefficient profile. (C) Forest plot of multivariate Cox regression results. (D) ROC curve of the training set. (E) ROC curve of the validation set GSE17537. (F) ROC curve of the validation set GSE39582. AUC, area under the curve; DEGs, differentially expressed genes; FC, fold change; LASSO, least absolute shrinkage and selection operator; ROC, receiver operating characteristic.

The risk score was calculated using the formula as follows:

Riskscore=0.3423RBP70.1707CXCL13 +0.4603KRT84+0.1906MAGEA10.1315MMP1+0.3022PCOLCE2+0.2079ALPP+0.3381FOXD1

Furthermore, to evaluate the risk stratification efficacy of the model, the samples were divided into high-risk and low-risk groups based on the median riskscore calculated from the training set (Figure 4A). The same method was used for grouping in the GSE17537 and GSE39582 validation sets (Figure 4B, Figure S2E). The survival analysis results indicated that the mortality rate of patients in the high-risk group was higher than that in the low-risk group in both the training set and the validation set, and the OS of the high-risk group was significantly lower than that of the low-risk group (P<0.05) (Figure 4A-4C, Figure S2E,S2F). These findings verified the good prognostic discrimination ability of this model.

Figure 4 Distribution of risk scores and their association with survival outcomes in training and validation cohorts. (A) Risk score distribution in the training cohort (left) and validation cohort GSE17537 (right). (B) Survival status (dead/alive) of patients in the training (left) and GSE17537 validation cohorts (right). The training set included a total of 122 death cases, among which 90 cases were in the high-risk group, and 32 cases were in the low-risk group; the validation set of GSE17537 contained 19 death cases, including 13 cases in the high-risk group and 6 cases in the low-risk group. (C) Survival curves for the training (left) and GSE17537 validation cohorts (right).

Evaluation of the independent prognostic value and clinical utility of the risk model based on clinical characteristics

To assess the applicability of the constructed risk score model across different clinical subgroups, we compared the distribution of patients with varying clinical characteristics between the high- and low-risk groups. The results revealed no significant differences in age or gender distribution (P>0.05), but significant disparities were observed in tumor stage (T stage, N stage, M stage, and overall stage) (P<0.05) (Figure 5A). The low-risk group predominantly comprised patients with T1–T2, N0–N1, M0, and stage I–II disease, whereas the high-risk group was enriched with T3–T4, N2, M1, and stage III–IV cases, indicating a strong association between risk scores and disease progression. To evaluate the independent prognostic value of clinical variables, univariate and multivariate Cox regression analyses were performed. The results demonstrated that risk score, age, M stage, and stage all served as independent prognostic factors for OS in CRC patients (P<0.05) (Figure 5B).

Figure 5 Evaluation of the independent prognostic value and clinical utility of the risk model based on clinical characteristics. (A) Distribution of clinical characteristics (age, gender, T, M, N, tumor stage) between high- and low-risk groups. (B) Univariate (left) and multivariate (right) Cox regression analyses of clinical features and risk score. (C) Nomogram incorporating clinical characteristics and risk score. (D) 1-, 3-, 5-year calibration curves of the nomogram. (E) 1-, 3-, 5-year decision curve analysis of the nomogram.

To enhance clinical applicability, we developed a nomogram incorporating the risk score and key clinical features (gender, age, TNM stage) to predict 1-, 3-, and 5-year OS probabilities (Figure 5C). Calibration curves and decision curve analysis (DCA) confirmed the nomogram’s robust predictive accuracy and demonstrated that clinical interventions based on its predictions would yield substantial net benefit (Figure 5D,5E).

Comparative analysis of TME and genomic mutation characteristics between risk groups

To further investigate the differences in TME and mutational profiles between risk groups, we first evaluated immune infiltration patterns via the ESTIMATE algorithm. The results revealed that the high-risk group exhibited significantly lower immune scores and ESTIMATE scores, coupled with higher tumor purity compared to the low-risk group (P<0.05) (Figure 6A), suggesting diminished immune cell infiltration in high-risk tumors. The analysis of mutation characteristics in the high-risk and low-risk groups showed that the TMB in the high-risk group was slightly higher than that in the low-risk group, but the difference did not reach the level of statistical significance (Figure S2G). Further comparison of the specific mutation spectra between the two groups revealed that the gene mutations in the high-risk group were more frequent, manifested as missense mutations, single nucleotide polymorphisms (SNPs), C>N and other types of mutations, with the number being higher than that in the low-risk group, and the mutation frequencies of key driver genes such as APC and TP53 were also higher (Figure 6B,6C). Further examination of the 8-gene prognostic model revealed that the high-risk group displayed mutations in 7 model genes encompassing Missense Mutations, Splice Site, Frame Shift Ins, and Multi-Hit, whereas the low-risk group showed mutations in 6 model genes primarily consisting of Missense Mutations, Frames Shift Del, Nonsense Mutations, and Multi-Hit (Figure 6D).

Figure 6 Evaluation of the prognostic independence and clinical utility of the risk model based on clinical features. (A) Comparison of Immune Score, Stromal Score, ESTIMATE Score, and tumor purity between high- and low-risk groups calculated by the ESTIMATE algorithm. (B) Summary of mutation profiles in the high-risk (left) and low-risk (right) groups. (C) Waterfall plots displaying the top 20 mutated genes in high-risk (left) and low-risk (right) groups. (D) Mutation patterns of model genes in high-risk (left) and low-risk (right) groups. ns, P>0.05; *, P<0.05; ***, P<0.001.

GSVA pathway enrichment analysis of high- and low-risk groups

To elucidate potential molecular and biological process disparities between risk groups, we performed GSVA enrichment analysis. The results demonstrated that the top 10 significantly upregulated pathways in the high-risk group were primarily associated with endocrine regulation (e.g., luteinizing hormone and gonadotropin secretion), embryonic developmental processes (e.g., embryonic skeletal system morphogenesis), and nervous system development (e.g., neural tube pattern formation). Conversely, significantly downregulated pathways in this group were predominantly enriched in immune regulation (e.g., positive regulation of immunoglobulin production, neutrophil-mediated cytotoxic regulation) and inflammation-related processes (e.g., negative regulation of cell chemotaxis to fibroblast growth factor) (Figure 7A). In the low-risk group, significantly upregulated pathways mainly included signal transduction pathways (e.g., Notch signaling pathway, Wnt signaling pathway) and metabolism-related pathways (e.g., phenylalanine metabolism). The downregulated pathways were principally associated with immune responses, encompassing classical immune signaling pathways such as the T cell receptor signaling pathway and Toll-like receptor signaling pathway (Figure 7B).

Figure 7 GSVA pathway enrichment analysis in high- and low-risk groups. (A) GSVA enrichment results for the high-risk group. (B) GSVA enrichment results for the low-risk group. GSVA, gene set variation analysis.

Functional analysis of the prognostic gene MAGEA1 and its expression at the histological level

MAGEA1 is found to be abnormally activated in various solid tumors [including BC and ovarian cancer (OC)], and has been identified as a potential tumor biomarker (37,38). MAGEA1 is shown to be highly expressed in the primary tumor tissues and peripheral blood of CRC patients, and the positive detection in peripheral blood is only seen in patients at stage Ⅲ–Ⅳ (39), suggesting that its high expression may be closely related to the malignant progression of CRC.

To clarify the biological function of MAGEA1 in CRC, we first analyzed the relationship between MAGEA1 expression and clinical pathological characteristics. The results showed that MAGEA1 was significantly higher in CRC tissues than in normal tissues, and its expression level increased with the increase in pathological grade, suggesting that MAGEA1 overexpression is associated with poorer pathological stage and poor prognosis (P<0.05) (Figure 8A,8B). K-M analysis further confirmed that patients with high MAGEA1 expression had significantly shorter OS time (P<0.05) (Figure 8C).

Figure 8 Expression characteristics and clinical significance of MAGEA1 in CRC tissues. (A) Distribution of MAGEA1 expression levels in CRC samples of different pathological stages. (B) Analysis of the expression differences of MAGEA1 in CRC tissues and normal tissues in the TCGA database. (C) Overall survival curves of CRC patients with high and low expression of MAGEA1 (K-M analysis). *, P<0.05; ***, P<0.001. CRC, colorectal cancer; K-M, Kaplan-Meier; TCGA, The Cancer Genome Atlas.

The drug sensitivity prediction results showed that TAF15496 and ribociclib (LEE011) exhibited higher sensitivity in the MAGEA1 highly expressed group (i.e., the predicted IC50 values were significantly lower, P<0.05) (Figure 9A). Molecular docking analysis further revealed that LEE011 could form multiple stable molecular interactions with the MAGEA1 protein, including hydrogen bonds formed with Val28 and Ser150 residues, as well as hydrophobic interactions with Arg99, Ser96, Ala100, Tyr103, Cys27, Val26, Ala107, Val106, and Val10 residues (Figure 9B,9C). These results suggest that LEE011 may affect the biological function of MAGEA1 through interacting with it.

Figure 9 Drug sensitivity prediction and molecular docking analysis. (A) Drug sensitivity analysis of different MAGEA1 expression groups predicted based on the GDSC database. (B) Two-dimensional schematic diagram of the molecular docking structure of the binding site of LEE011 and MAGEA1 protein. (C) Three-dimensional schematic diagram of the molecular docking structure of the binding site of LEE011 and MAGEA1 protein. ns, P>0.05; *, P<0.05. GDSC, Genomics of Drug Sensitivity in Cancer.

Knockdown of MAGEA1 inhibits the proliferation of CRC cells

A previous study has shown that ribociclib (LEE011) inhibits the activity of CDK4/6, leading to dephosphorylation of Rb and G1 phase arrest, and further affecting G2/M checkpoint-related proteins cyclin B1 and CDC2 (40). Based on our drug prediction results, we hypothesized that MAGEA1 might promote the proliferation of CRC cells by activating cell cycle-related pathways (such as the CDK4/6–Rb axis), while ribociclib might exert a stronger inhibitory effect on tumor cells with high MAGEA1 expression by inhibiting this pathway.

To verify this hypothesis, we detected the expression of MAGEA1 in CRC cell lines and found that MAGEA1 was highly expressed in multiple CRC cell lines, with the highest expression in HCT116 cells (P<0.05) (Figure 10A,10B). Therefore, we chose HCT116 cells for subsequent experiments. RT-qPCR verification showed that we successfully obtained stable cell lines with MAGEA1 knockdown (P<0.05) (Figure 10C). The results of CCK-8 and clone formation experiments indicated that MAGEA1 knockdown significantly inhibited the proliferation ability of HCT116 cells (P<0.05) (Figure 10D,10E). Flow cytometry analysis of cell cycle further showed MAGEA1 knockdown, LEE011 treatment, and the combination of both significantly increased the proportion of cells in the G1 phase (P<0.05) (Figure 10F), suggesting that the cells were arrested at the G1 phase. WB results showed that MAGEA1 knockdown or LEE011 treatment could reduce the phosphorylation levels of Rb, cyclin B1, and CDC2 (P<0.05), and the combined treatment group did not show additional inhibitory effects compared to the LEE011 treatment alone (Figure 10G). These results collectively indicate that MAGEA1 may promote the proliferation of CRC cells by regulating the CDK4/6–Rb signaling axis, while ribociclib inhibits this pathway and exerts a stronger anti-proliferative effect on CRC cells with high MAGEA1 expression.

Figure 10 Functional verification of MAGEA1. (A) mRNA expression levels of MAGEA1 in different CRC cell lines. (B) Western blot results of MAGEA1 protein expression in each cell line. (C) RT-qPCR verification of MAGEA1 knockdown efficiency in HCT116 cells. (D) CCK-8 assay to detect the proliferation ability of each group of cells. (E) Clone formation assay to detect changes in cell proliferation ability (0.1% crystal violet, magnification). (F) Flow cytometry analysis of cell cycle distribution in each group. (G) Western blot detection of Rb, cyclin B1, CDC2 and their phosphorylation levels in each group. The data are presented in the form of mean ± standard error and are sourced from three independent biological replicates (n=3). Protein blot is a representative result of independent experiments. ns, P≥0.05; *, P<0.05; ****, P<0.0001. CRC, colorectal cancer; mRNA, messenger RNA; OD, optical density; RT-qPCR, reverse transcription quantitative polymerase chain reaction.

Overall, our results suggest that MAGEA1 may promote the proliferation of CRC cells by regulating the CDK4/6–Rb signaling axis, and CDK4/6 inhibitor ribociclib (LEE011) inhibits this pathway and exerts a more significant inhibitory effect on CRC cells with high MAGEA1 expression.


Discussion

A large number of studies have shown that the endosomal system, as the core hub for membrane transport and signal transduction in cells, plays a crucial role in the occurrence, development, metastasis, and treatment response of tumors (14,41,42). The integrity of endosomal function coordinates the bidirectional logistics, including the endocytic pathway (material internalization) and the secretory/cytosolic pathway (material excretion or inter-organelle transport), and the dysregulation is closely related to various malignant phenotypes of tumors (43,44). Therefore, in-depth exploration of its related mechanisms not only helps to reveal the potential biological behavior of tumors but also may provide new ideas and targets for the discovery of biomarkers and the formulation of treatment strategies. However, there is currently a lack of classification and prognosis studies of ERGs in CRC. Therefore, this study systematically applied ERGs to the identification of CRC subtypes and constructed a prognostic model for key genes to reveal the importance and clinical potential of ERGs in CRC biology. Our research revealed the potential roles of key genes in the endosomal transport network (such as RBSN, NSG1, and PDCD6IP) in CRC. The dysregulation of these genes may affect various cellular functions, including material endocytosis, sorting, circulation, and signal transduction, thereby participating in the pathological process of CRC.

Focusing on the potential biological functions and prognostic value of ERGs in CRC, we first identified three ERGs significantly associated with OS and subsequently constructed two endosome-related molecular subtypes. These subtypes exhibited marked differences across multiple dimensions, including survival outcomes, immune infiltration, immunotherapy responsiveness, and drug sensitivity. Notably, cluster 1 demonstrated superior survival prognosis and a more immunologically active TME, characterized by enriched populations of favorable prognostic immune cells such as B cells and mast cells. These immune cells enhance anti-tumor immune responses by facilitating tumor cell recognition and clearance, with their high infiltration levels strongly correlating with immune-activated states and prolonged survival (45,46). In contrast, cluster 2 displayed immunosuppressive features, including significant enrichment of parainflammation—a condition documented to induce aberrant immune responses to gut microbiota, compromise intestinal barrier integrity, and disrupt immune surveillance, thereby promoting tumorigenesis (47). These mechanisms likely underlie the poorer prognosis observed in cluster 2. Immune checkpoint analysis further validated these distinct immunological profiles. Cluster 1 exhibited significantly elevated expression of immune-activating molecules (e.g., CD28, ICOS) alongside lower TIDE scores, suggesting its enhanced sensitivity to immunotherapy. This finding carries substantial clinical implications. Upregulation of co-stimulatory molecules like CD28 predicts robust immune response potential (48), while lower TIDE scores serve as established biomarkers for improved immunotherapy efficacy (49). Consequently, our ERG-based classification system not only demonstrated robust prognostic stratification capacity but also provided a clinically actionable framework for guiding personalized immunotherapy decisions.

Building upon the differential expression and Cox regression analyses of the two endosome-related molecular subtypes, we identified eight prognostic biomarker genes (RBP7, KRT84, MAGEA1, PCOLCE2, ALPP, FOXD1, CXCL13, and MMP1) and established an 8-gene prognostic model. This model demonstrated robust survival prediction performance in both TCGA and GSE17537 cohorts. The results showed its clinical utility for CRC prognosis assessment. It is worth noting that the protein products encoded by these 8 genes have diverse functions, but many of them are directly or indirectly related to endosomal transport or the cellular processes regulated by it. Studies have shown that RBP7 is a fat tissue-specific retinol transporter (50,51), and its connection with the endosome is still unknown, but it has been proven to be involved in the occurrence and progression of various tumors (including BC and bladder cancer) (52,53). In CRC, RBP7 emerges as a clinically relevant prognostic biomarker whose dysregulation potently enhances cancer cell migration and invasion and correlates with invasive phenotypes and epithelial-mesenchymal transition (EMT) (54). Liu et al. (55) (2020) further identified it as a putative tumor suppressor and favorable prognostic marker in oral squamous cell carcinoma. KRT84, also known as type II hair keratin Hb4, can form nails and hair using type I keratin (55,56). Currently, there are no studies that clearly indicate its direct association with the endosomal pathway or CRC. However, there is evidence suggesting that it may play a tumor suppressor role and is considered one of the good prognostic indicators in oral squamous cell carcinoma (55). MAGEA1 belongs to the Cancer/Testis Antigen (CTA) family. It can inhibit the NICD1 transcription factor downstream of the Notch pathway by interacting with the tumor suppressor ubiquitin ligase FBXW7, thereby exerting a tumor suppressive effect (38). Although there is no direct evidence clarifying its specific mechanism in endosomal transport, the activation of the Notch signaling pathway depends on key steps such as endocytosis of the receptor and endosomal sorting of the ligand-receptor complex (57). Therefore, the process by which MAGEA1 regulates signaling pathways such as Notch may indirectly affect signal transduction processes that rely on endosomal transport. Additionally, studies have shown that MAGEA1 is abnormally activated in various solid tumors (such as BC and OC), and it has a high tumor specificity (37,38). A CRC study reveals elevated expression of MAGEA1 in both primary tumors and peripheral blood, with high expression in peripheral blood exclusively observed in stage III/IV patients (39), suggesting its potential as an advanced-stage disease biomarker linked to malignant progression. Combining the results of bioinformatics prediction with in vitro experiments, we found that high expression of MAGEA1 is closely related to the activation of the cell cycle pathway. Knockdown of MAGEA1 and treatment with CDK4/6 inhibitor LEE011 both led to a decrease in Rb phosphorylation level and G1 phase arrest. Moreover, the combination of the two had no additional inhibitory effect, suggesting that the proliferative-promoting effect of MAGEA1 may depend on the activity of the CDK4/6-Rb pathway. This provides potential evidence for it as a novel biomarker for predicting the sensitivity to CDK4/6 inhibitors. PCOLCE2 is an extracellular matrix glycoprotein that dually regulates collagen maturation by enhancing BMP-1-mediated procollagen C-terminal proteolysis and modulating BMPs activity, thereby playing multifaceted roles in collagen fibrillation, lipid metabolism, and inflammation (58,59). In CRC, PCOLCE2 has been identified as a biomarker associated with extracellular matrix remodeling (60), metabolic reprogramming, and immune regulation (61). Its expression correlates significantly with age stratification (62), MSI status, advanced clinical stages (II-IV), and independently predicts poorer OS (63). ALPP (Alkaline Phosphatase, Placental Type) is also known as PLAP and is a member of the alkaline phosphatase family. As a membrane-anchored protein, its transport and membrane localization are the result of vesicle transport (64). In in vitro phagocytosis experiments, exogenous ALPP can weaken the phagocytic ability of THP-1 monocytes (65). ALPP is typically highly expressed in testicular germ cell tumors, and its high expression in CRC is considered to be related to advanced pathological tumor stage, lymph node metastasis, lymphatic and vascular invasion (66). FOXD1 (Forkhead Box D1) is a transcription factor belonging to the FOX family, which plays multi-level regulatory roles in embryonic development and the occurrence of various diseases (67). Currently, no studies are confirming that FOXD1 directly participates in endosome or endocytosis-related processes, but some research has shown that FOXD1+ cells can differentiate into endothelial cells with endocytic ability (68). Additionally, studies have revealed that FOXD1 is highly expressed in various tumors (including pancreatic cancer, BC, and gastric cancer), and it participates in regulating processes such as EMT, migration and invasion, and chemotherapy resistance (69-71). In CRC, FOXD1 overexpression not only independently predicts adverse outcomes but may also augment cancer stemness and drug resistance through β-catenin nuclear translocation. It is worth emphasizing that PCOLCE2, ALPP, CXCL13, and MMP1 all belong to classic secretory or membrane-localized proteins, and their maturation, folding, and transport depend on the endosomal transport pathway. The membrane localization or secretion of CXCL13 not only demonstrates the typical characteristics of exocytic trafficking, but also the endocytosis-recirculation of its receptors (such as integrins, CXCR5), which directly participate in the migration, signal transduction, and immune microenvironment construction of tumor cells (72,73). MMP1 is an ECM remodeling enzyme, and its excretion is an important process driving matrix degradation and invasion (74), and the dynamic regulation of ECM and integrins also relies on endosomal recycling (75). This evidence indicates that multiple endosomal transport-related mechanisms, including endocytic internalization, endosomal sorting, exocytic secretion, and receptor recycling, play important roles in the progression of CRC. The 8-gene model we constructed reflects these processes from different levels, covering metabolic regulation, cytoskeleton maintenance, signal transduction regulation, and ECM remodeling, presenting a comprehensive picture of the imbalance in the endosomal transport network of CRC. Collectively, our prognostic model genes have been mechanistically validated across tumor types for their roles in oncogenesis, progression, metastasis, immune modulation, and outcome prediction, with most exhibiting distinct expression patterns and clinical relevance in CRC. These findings robustly support the biological plausibility and translational potential of our signature.

Furthermore, our analyses revealed that the high-risk group exhibited significantly higher tumor purity, lower immune scores, and an increased mutation burden in key driver genes (e.g., TP53, APC), suggesting that the risk score reflects not only prognostic outcomes but also distinct tumor immunological states and mutational landscapes. GSVA analysis demonstrated that the high-risk group was characterized by upregulated pathways predominantly involved in embryonic development and nervous system differentiation, whereas downregulated pathways were enriched for immune-active processes, including positive regulation of immunoglobulin production and neutrophil-mediated cytotoxicity regulation. These findings collectively suggested an immunosuppressive phenotype that may underlie the adverse clinical outcomes in this subgroup.

However, several limitations should be considered when interpreting these findings. (I) As a retrospective bioinformatics analysis relying primarily on bulk RNA-seq data from public databases (TCGA and GEO), our results remain at the theoretical prediction level and require prospective validation through large-scale, multi-center clinical studies. (II) While we have comprehensively characterized the associations between ERGs and prognosis/TME across multiple dimensions, the underlying mechanistic relationships need to be experimentally validated through in vitro and in vivo functional studies. (III) Furthermore, although the 8-gene risk model demonstrated stable performance in multiple validation cohorts, its practical applicability in clinical settings requires additional experimental evaluation. (IV) Although the constructed molecular model achieved an AUC of over 0.6 in both the training set and validation set, its DCA revealed that the net benefit compared to the traditional clinical staging was limited, suggesting that the clinical added value of the model was not significant at present. Therefore, this model is currently mainly used to explore the potential prognostic role and risk stratification of ERGs. In the future, it still needs to be optimized and verified by combining more clinical features and a larger sample size. (V) Due to the lack of available treatment information in the existing public data, we cannot completely rule out the influence of treatment differences on survival outcomes. Further validation of the model’s robustness is still needed in cohorts with complete clinical data.


Conclusions

Overall, our study systematically identified two distinct endosome-related molecular subtypes and constructed an 8-gene prognostic model, providing new insights into the biological functions of endosome-related molecules in CRC and offering potential avenues for molecular classification and personalized risk assessment.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD and MDAR reporting checklists. Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-593/rc

Data Sharing Statement: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-593/dss

Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-593/prf

Funding: None.

Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-593/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Zhou Y, Song K, Chen Y, et al. Burden of six major types of digestive system cancers globally and in China. Chin Med J (Engl) 2024;137:1957-64. [Crossref] [PubMed]
  2. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
  3. Khaliq AM, Erdogan C, Kurt Z, et al. Refining colorectal cancer classification and clinical stratification through a single-cell atlas. Genome Biol 2022;23:113. [Crossref] [PubMed]
  4. Liu JY, Liang ZH, Liu JL, et al. Clinical observation of combined transarterial chemoembolization and targeted therapy in postoperative recurrent colorectal cancer with liver metastasis. World J Gastrointest Surg 2025;17:104568. [Crossref] [PubMed]
  5. Ji K, Jia H, Liu Z, et al. New insight in immunotherapy and combine therapy in colorectal cancer. Front Cell Dev Biol 2024;12:1453630. [Crossref] [PubMed]
  6. Taghiakbari M, Mori Y, von Renteln D. Artificial intelligence-assisted colonoscopy: A review of current state of practice and research. World J Gastroenterol 2021;27:8103-22. [Crossref] [PubMed]
  7. Troelsen FS, Sørensen HT, Pedersen L, et al. Root-cause Analysis of 762 Danish Post-colonoscopy Colorectal Cancer Patients. Clin Gastroenterol Hepatol 2023;21:3160-3169.e5. [Crossref] [PubMed]
  8. Wang Q, Xu J, Wang A, et al. Systematic review of machine learning-based radiomics approach for predicting microsatellite instability status in colorectal cancer. Radiol Med 2023;128:136-48. [Crossref] [PubMed]
  9. Yang L, Dong D, Fang M, et al. Can CT-based radiomics signature predict KRAS/NRAS/BRAF mutations in colorectal cancer? Eur Radiol 2018;28:2058-67. [Crossref] [PubMed]
  10. Biller LH, Schrag D. Diagnosis and Treatment of Metastatic Colorectal Cancer: A Review. JAMA 2021;325:669-85. [Crossref] [PubMed]
  11. Ugai T, Haruki K, Harrison TA, et al. Molecular Characteristics of Early-Onset Colorectal Cancer According to Detailed Anatomical Locations: Comparison With Later-Onset Cases. Am J Gastroenterol 2023;118:712-26. [Crossref] [PubMed]
  12. Dong J, Tong W, Liu M, et al. Endosomal traffic disorders: a driving force behind neurodegenerative diseases. Transl Neurodegener 2024;13:66. [Crossref] [PubMed]
  13. Naslavsky N, Caplan S. Advances and challenges in understanding endosomal sorting and fission. FEBS J 2023;290:4187-95. [Crossref] [PubMed]
  14. Banushi B, Joseph SR, Lum B, et al. Endocytosis in cancer and cancer therapy. Nat Rev Cancer 2023;23:450-73. [Crossref] [PubMed]
  15. Wang D, Liu S, Wang G. Establishment of an Endocytosis-Related Prognostic Signature for Patients With Low-Grade Glioma. Front Genet 2021;12:709666. [Crossref] [PubMed]
  16. Gallon M, Cullen PJ. Retromer and sorting nexins in endosomal sorting. Biochem Soc Trans 2015;43:33-47. [Crossref] [PubMed]
  17. Hu CT, Wu JR, Wu WS. The role of endosomal signaling triggered by metastatic growth factors in tumor progression. Cell Signal 2013;25:1539-45. [Crossref] [PubMed]
  18. Migliano SM, Schultz SW, Wenzel EM, et al. Removal of hypersignaling endosomes by simaphagy. Autophagy 2024;20:769-91. [Crossref] [PubMed]
  19. Mason JD, Marks E, Fan SJ, et al. Stress-induced Rab11a-exosomes induce amphiregulin-mediated cetuximab resistance in colorectal cancer. J Extracell Vesicles 2024;13:e12465. [Crossref] [PubMed]
  20. Sun L, Xu X, Chen Y, et al. Rab34 regulates adhesion, migration, and invasion of breast cancer cells. Oncogene 2018;37:3698-714. [Crossref] [PubMed]
  21. Reda M, Ngamcherdtrakul W, Gu S, et al. PLK1 and EGFR targeted nanoparticle as a radiation sensitizer for non-small cell lung cancer. Cancer Lett 2019;467:9-18. [Crossref] [PubMed]
  22. Tejeda-Muñoz N, Albrecht LV, Bui MH, et al. Wnt canonical pathway activates macropinocytosis and lysosomal degradation of extracellular proteins. Proc Natl Acad Sci U S A 2019;116:10402-11. [Crossref] [PubMed]
  23. Tejeda-Muñoz N, De Robertis EM. Lysosomes are required for early dorsal signaling in the Xenopus embryo. Proc Natl Acad Sci U S A 2022;119:e2201008119. [Crossref] [PubMed]
  24. Colozza G, Jami-Alahmadi Y, Dsouza A, et al. Wnt-inducible Lrp6-APEX2 interacting proteins identify ESCRT machinery and Trk-fused gene as components of the Wnt signaling pathway. Sci Rep 2020;10:21555. [Crossref] [PubMed]
  25. Tejeda-Muñoz N, Binder G, Mei KC. Emerging therapeutic strategies for Wnt-dependent colon cancer targeting macropinocytosis. Cells Dev 2024;180:203974. [Crossref] [PubMed]
  26. Zhang L, Zhang X, Guan M, et al. Identification of a novel ADCC-related gene signature for predicting the prognosis and therapy response in lung adenocarcinoma. Inflamm Res 2024;73:841-66. [Crossref] [PubMed]
  27. Eide PW, Bruun J, Lothe RA, et al. CMScaller: an R package for consensus molecular subtyping of colorectal cancer pre-clinical models. Sci Rep 2017;7:16618. [Crossref] [PubMed]
  28. Sun S, Wang K, Guo D, et al. Identification of the key DNA damage response genes for predicting immunotherapy and chemotherapy efficacy in lung adenocarcinoma based on bulk, single-cell RNA sequencing, and spatial transcriptomics. Comput Biol Med 2024;171:108078. [Crossref] [PubMed]
  29. Li J, Qiao H, Wu F, et al. A novel hypoxia- and lactate metabolism-related signature to predict prognosis and immunotherapy responses for breast cancer by integrating machine learning and bioinformatic analyses. Front Immunol 2022;13:998140. [Crossref] [PubMed]
  30. Shi Y, Wang Y, Dong H, et al. Crosstalk of ferroptosis regulators and tumor immunity in pancreatic adenocarcinoma: novel perspective to mRNA vaccines and personalized immunotherapy. Apoptosis 2023;28:1423-35. [Crossref] [PubMed]
  31. Liu Y, Zhao Y, Zhang S, et al. Developing a prognosis and chemotherapy evaluating model for colon adenocarcinoma based on mitotic catastrophe-related genes. Sci Rep 2024;14:1655. [Crossref] [PubMed]
  32. Cheng R, Xu X. Validation of Hepatocellular Carcinoma Risk Prediction Models in Patients with Hepatitis B-Related Cirrhosis. J Hepatocell Carcinoma 2022;9:987-97. [Crossref] [PubMed]
  33. Liao X, Wang W, Yu B, et al. Thrombospondin-2 acts as a bridge between tumor extracellular matrix and immune infiltration in pancreatic and stomach adenocarcinomas: an integrative pan-cancer analysis. Cancer Cell Int 2022;22:213. [Crossref] [PubMed]
  34. Yan J, Liu Y, Liu T, et al. A predictive and prognostic model for metastasis risk and prognostic factors in gastrointestinal signet ring cell carcinoma. Eur J Med Res 2024;29:545. [Crossref] [PubMed]
  35. Awasthi S, Kumar R, Pradhan D, et al. Genomic landscape of gallbladder cancer: insights from whole exome sequencing. Int J Surg 2024;110:6883-97. [Crossref] [PubMed]
  36. Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47. [Crossref] [PubMed]
  37. Zhao J, Wang Y, Liang Q, et al. MAGEA1 inhibits the expression of BORIS via increased promoter methylation. J Cell Sci 2019;132:jcs218628. [PubMed]
  38. Zhao J, Wang Y, Mu C, et al. MAGEA1 interacts with FBXW7 and regulates ubiquitin ligase-mediated turnover of NICD1 in breast and ovarian cancer cells. Oncogene 2017;36:5023-34. [Crossref] [PubMed]
  39. Novikov DV, Belova TV, Plekhanova ES, et al. Early detection of cancer/testis mRNAs in tumor cells circulating in the peripheral blood of colorectal cancer patients. Mol Biol (Mosk) 2012;46:766-72. [PubMed]
  40. Huang CI, Huang YK, Lee HM, et al. Synergistic and Antagonistic Antiproliferative Effects of Ribociclib (Lee011) and Irinotecan (SN38) on Colorectal Cancer Cells. Anticancer Res 2023;43:1933-41. [Crossref] [PubMed]
  41. Wenzel EM, Pedersen NM, Elfmark LA, et al. Intercellular transfer of cancer cell invasiveness via endosome-mediated protease shedding. Nat Commun 2024;15:1277. [Crossref] [PubMed]
  42. Ye Z, Xiong Y, Peng W, et al. Manipulation of PD-L1 Endosomal Trafficking Promotes Anticancer Immunity. Adv Sci (Weinh) 2023;10:e2206411. [Crossref] [PubMed]
  43. Scott CC, Vacca F, Gruenberg J. Endosome maturation, transport and functions. Semin Cell Dev Biol 2014;31:2-10. [Crossref] [PubMed]
  44. Hurley JH, Coyne AN, Miączyńska M, et al. The expanding repertoire of ESCRT functions in cell biology and disease. Nature 2025;642:877-88. [Crossref] [PubMed]
  45. Song F, Zhang Y, Chen Q, et al. Mast cells inhibit colorectal cancer development by inducing ER stress through secreting Cystatin C. Oncogene 2023;42:209-23. [Crossref] [PubMed]
  46. Wennhold K, Thelen M, Lehmann J, et al. CD86(+) Antigen-Presenting B Cells Are Increased in Cancer, Localize in Tertiary Lymphoid Structures, and Induce Specific T-cell Responses. Cancer Immunol Res 2021;9:1098-108. [Crossref] [PubMed]
  47. Noble A, Pring ET, Durant L, et al. Altered immunity to microbiota, B cell activation and depleted γδ/resident memory T cells in colorectal cancer. Cancer Immunol Immunother 2022;71:2619-29. [Crossref] [PubMed]
  48. Jia H, Guo J, Liu Z, et al. High expression of CD28 enhanced the anti-cancer effect of siRNA-PD-1 through prompting the immune response of melanoma-bearing mice. Int Immunopharmacol 2022;105:108572. [Crossref] [PubMed]
  49. Jiang P, Gu S, Pan D, et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat Med 2018;24:1550-8. [Crossref] [PubMed]
  50. Chen H, Babino D, Schoenbichler SA, et al. Nmnat1-Rbp7 Is a Conserved Fusion-Protein That Combines NAD+ Catalysis of Nmnat1 with Subcellular Localization of Rbp7. PLoS One 2015;10:e0143825. [Crossref] [PubMed]
  51. Ahn J, Kim DH, Suh Y, et al. Adipose-specific expression of mouse Rbp7 gene and its developmental and metabolic changes. Gene 2018;670:38-45. [Crossref] [PubMed]
  52. Yu Y, Xu Z, Zhou H, et al. RBP7 functions as a tumor suppressor in HR + breast cancer by inhibiting the AKT/SREBP1 pathway and reducing fatty acid. Cancer Cell Int 2024;24:118. [Crossref] [PubMed]
  53. Liu Y, Fan M, Xian S, et al. RBP7 Regulated by EBF1 Affects Th2 Cells and the Oocyte Meiosis Pathway in Bone Metastases of Bladder Urothelial Carcinoma. Front Biosci (Landmark Ed) 2023;28:189. [Crossref] [PubMed]
  54. Elmasry M, Brandl L, Engel J, et al. RBP7 is a clinically prognostic biomarker and linked to tumor invasion and EMT in colon cancer. J Cancer 2019;10:4883-91. [Crossref] [PubMed]
  55. Liu Y, Li R, Ren G. KRT84 is a potential tumor suppressor and good prognosis signature of oral squamous cell carcinoma. Biosci Rep 2020;40:BSR20200187. [Crossref] [PubMed]
  56. Chang X, Zhao Y, Wang Y, et al. Screening citrullinated proteins in synovial tissues of rheumatoid arthritis using 2-dimensional western blotting. J Rheumatol 2013;40:219-27. [Crossref] [PubMed]
  57. Zhou B, Lin W, Long Y, et al. Notch signaling pathway: architecture, disease, and therapeutics. Signal Transduct Target Ther 2022;7:95. [Crossref] [PubMed]
  58. Napoli M, Bauer J, Bonod C, et al. PCPE-2 (procollagen C-proteinase enhancer-2): The non-identical twin of PCPE-1. Matrix Biol 2024;134:59-78. [Crossref] [PubMed]
  59. Xu H, Thomas MJ, Kaul S, et al. Pcpe2, a Novel Extracellular Matrix Protein, Regulates Adipocyte SR-BI-Mediated High-Density Lipoprotein Uptake. Arterioscler Thromb Vasc Biol 2021;41:2708-25. [Crossref] [PubMed]
  60. Yin Y, Yang X, Cheng Z, et al. Identification of extracellular matrix-related biomarkers in colon adenocarcinoma by bioinformatics and experimental validation. Front Immunol 2024;15:1371584. [Crossref] [PubMed]
  61. Xie Y, Guan S, Li Z, et al. Identification of a metabolic-immune signature associated with prognosis in colon cancer and exploration of potential predictive efficacy of immunotherapy response. Clin Exp Med 2025;25:46. [Crossref] [PubMed]
  62. Yao H, Li C, Tan X. An age stratified analysis of the biomarkers in patients with colorectal cancer. Sci Rep 2021;11:22464. [Crossref] [PubMed]
  63. Li N, Shen J, Qiao X, et al. Identification of the Immune-related lncRNA SNHG14/ miR-200a-3p/ PCOLCE2 Axis in Colorectal Cancer. Altern Ther Health Med 2024;AT10303. [PubMed]
  64. Sharma U, Pal D, Prasad R. Alkaline phosphatase: an overview. Indian J Clin Biochem 2014;29:269-78. [Crossref] [PubMed]
  65. Chen TC, Ng KF, Chen N, et al. Evaluation of the immunological functions of placental alkaline phosphatase in vivo using ALPP transgenic mice. Front Immunol 2025;16:1499388. [Crossref] [PubMed]
  66. Reiswich V, Gorbokon N, Luebke AM, et al. Pattern of placental alkaline phosphatase (PLAP) expression in human tumors: a tissue microarray study on 12,381 tumors. J Pathol Clin Res 2021;7:577-89. [Crossref] [PubMed]
  67. Huang J, Liang B, Wang T. FOXD1 expression in head and neck squamous carcinoma: a study based on TCGA, GEO and meta-analysis. Biosci Rep 2021;41:BSR20210158. [Crossref] [PubMed]
  68. Sims-Lucas S, Schaefer C, Bushnell D, et al. Endothelial Progenitors Exist within the Kidney and Lung Mesenchyme. PLoS One 2013;8:e65993. [Crossref] [PubMed]
  69. Cai K, Chen S, Zhu C, et al. FOXD1 facilitates pancreatic cancer cell proliferation, invasion, and metastasis by regulating GLUT1-mediated aerobic glycolysis. Cell Death Dis 2022;13:765. [Crossref] [PubMed]
  70. Long Y, Chong T, Lyu X, et al. FOXD1-dependent RalA-ANXA2-Src complex promotes CTC formation in breast cancer. J Exp Clin Cancer Res 2022;41:301. [Crossref] [PubMed]
  71. Wu Q, Ma J, Wei J, et al. FOXD1-AS1 regulates FOXD1 translation and promotes gastric cancer progression and chemoresistance by activating the PI3K/AKT/mTOR pathway. Mol Oncol 2021;15:299-316. [Crossref] [PubMed]
  72. Lucotti S, Ogitani Y, Kenific CM, et al. Extracellular vesicles from the lung pro-thrombotic niche drive cancer-associated thrombosis and metastasis via integrin beta 2. Cell 2025;188:1642-1661.e24. [Crossref] [PubMed]
  73. El Haibi CP, Sharma PK, Singh R, et al. PI3Kp110-, Src-, FAK-dependent and DOCK2-independent migration and invasion of CXCL13-stimulated prostate cancer cells. Mol Cancer 2010;9:85. [Crossref] [PubMed]
  74. Sato H, Takino T, Miyamori H. Roles of membrane-type matrix metalloproteinase-1 in tumor invasion and metastasis. Cancer Sci 2005;96:212-7. [Crossref] [PubMed]
  75. Mana G, Valdembri D, Serini G. Conformationally active integrin endocytosis and traffic: why, where, when and how? Biochem Soc Trans 2020;48:83-93. [Crossref] [PubMed]
Cite this article as: Huang X, Yang J. Molecular subtyping and prognostic model construction based on endosome-related genes in colorectal cancer. J Gastrointest Oncol 2026;17(2):66. doi: 10.21037/jgo-2025-593

Download Citation