Stemness-based gastric cancer classification by machine learning for precision diagnosis and treatment of gastric cancer
Original Article

Stemness-based gastric cancer classification by machine learning for precision diagnosis and treatment of gastric cancer

Shixin Zhou1#, Chenyu Tian2,3,4#, Tao Zhu2,3,4#, Hao Chen2,3,4, Changxin Chen5, Quan Jiang2,3,4, Fenglin Liu1

1Department of Gastric Surgery, Fudan University Shanghai Cancer Center, Shanghai, China; 2Department of General Surgery, Zhongshan Hospital, Fudan University, Shanghai, China; 3Cancer Center, Zhongshan Hospital, Fudan University, Shanghai, China; 4Gastric Cancer Center, Zhongshan Hospital, Fudan University, Shanghai, China; 5Department of Gastroenterology, Quanzhou First Hospital Affiliated to Fujian Medical University, Quanzhou, China

Contributions: (I) Conception and design: S Zhou, C Chen, Q Jiang, F Liu; (II) Administrative support: F Liu; (III) Provision of study materials or patients: F Liu; (IV) Collection and assembly of data: C Tian, T Zhu, H Chen; (V) Data analysis and interpretation: S Zhou, C Tian, T Zhu, C Chen; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Quan Jiang, MD, PhD. Department of General Surgery, Zhongshan Hospital, Fudan University, 180 Fenglin Road, Shanghai 200032, China; Cancer Center, Zhongshan Hospital, Fudan University, Shanghai, China; Gastric Cancer Center, Zhongshan Hospital, Fudan University, Shanghai, China. Email: 19111210086@fudan.edu.cn; Changxin Chen, MD, PhD. Department of Gastroenterology, Quanzhou First Hospital Affiliated to Fujian Medical University, 250 East Street, Licheng District, Quanzhou 362000, China. Email: 68760783@qq.com; Fenglin Liu, MD, PhD. Department of Gastric Surgery, Fudan University Shanghai Cancer Center, 270 Dongan Road, Shanghai 200032, China. Email: fenglinliu@hotmail.com.

Background: Stemness has been shown to play an important role in immunotherapy and chemotherapy response. Machine learning has been used to predict stemness-based cancer subtypes. We aimed to conduct a stemness-based classification of gastric cancer (GC) for the early identification of patients at risk for GC and guide treatment.

Methods: Stemness indices [mRNA stemness index (mRNAsi)] of 389 patients with GC from The Cancer Genome Atlas (TCGA) database were generated using one-class logistic regression (OCLR) algorithm. Consensus clustering was performed to divide the patients with GC into two subtypes based on their stemness indices. Finally, four machine learning algorithms were used to construct a logistic regression model containing 12 critical genes. An external cohort was used as the validation cohort.

Results: Stemness subtype cluster1 had higher mRNAsi scores and a significantly better prognosis, while stemness subtype cluster2 had higher immunocompetence. In terms of the prediction of therapeutic efficacy, patients in cluster2 may have a better response to anti-cytotoxic T lymphocyte antigen 4 (anti-CTLA4) therapy, whereas no significant response to anti-programmed cell death 1 (anti-PD1) therapy was observed in either subtype. The two subtypes showed significant differences in tolerance to chemotherapy. A total of 1,863 differentially expressed genes (DEGs) were identified based on the stemness signature of GC, of which 12 critical genes were selected to predict the stemness subtype. The consistency of the results in the validation cohort indicated a promising application of this stemness-based classification and predictive model.

Conclusions: Our machine learning approach performed an overall analysis of the relationship between the stemness of GC and the therapeutic effect, identified a promising stemness-based classification of GC to predict prognosis and treatment efficacy, and developed a predictive model to make the stemness-based classification accessible for clinical practice.

Keywords: Gastric cancer; stemness; machine learning; immunotherapy; chemotherapy


Submitted Nov 01, 2024. Accepted for publication Apr 17, 2025. Published online Oct 30, 2025.

doi: 10.21037/jgo-24-665


Highlight box

Key findings

• Two distinct stemness-based subtypes of gastric cancer (GC) were identified: cluster1 [higher mRNA stemness index (mRNAsi) scores, better prognosis] and cluster2 (enhanced immunocompetence).

• A 12-gene machine learning model achieved robust classification, validated in external cohorts.

What is known and what is new?

• Stemness impacts immunotherapy/chemotherapy outcomes, but GC-specific classification tools were underdeveloped.

• This stemness-based predictive model represents a novel machine learning framework linking mRNAsi to GC subtypes, enabling prognosis stratification and therapy-specific clinical decision-making.

What is the implication, and what should change now?

• This study establishes a machine learning-driven framework to prognosticate outcomes and personalize therapeutic decision-making in GC.

• Integration of this 12-gene model into clinical diagnostics will allow stratification of GC patients, refinement of treatment selection, and improvement of therapeutic outcomes.


Introduction

As a heterogeneous cancer, GC is the fifth most common malignant disease and the fourth leading cause of cancer-related deaths globally, especially in East Asia (1). Because most GC cases are diagnosed at an advanced stage, owing to the heterogeneity of GC and drug resistance, patients with GC have a poor prognosis (2-4). The overall survival (OS) of patients with GC remains at 25% worldwide (5). Adjuvant therapy after radical resection is the current standard treatment that improves survival rates. However, several classical clinical trials have verified that the increase in OS is restricted, and the median OS is only 11 months (6,7). Several biomarkers have been verified to have prognostic and therapeutic potential, but these molecular predictors still need to be further validated (8,9). Therefore, novel treatment strategies and classifications for GC are urgently required.

Stemness is defined as the potential for self-renewal and differentiation from native cells and was originally used to explain the ability of normal adult stem cells (SCs) to generate all cell types (10). Cancer stem cells (CSCs) have stem-cell-like features. CSCs can spread more easily than normal tumor cells, leading to drug resistance, tumor recurrence, and metastasis (11-13). Increasing evidence indicates that GC stem cells (GCSCs) may be one of the most crucial factors in therapeutic resistance and tumor recurrence (14). Recently, several anti-GSCC therapies targeting stemness-based functions have been developed (15). However, the mechanism by which GCSCs cause drug resistance and tumor recurrence remains unknown. Solving this problem may lead to the development of effective treatment strategies.

Classic treatments, including surgical resection, radiotherapy, chemotherapy, and anti-angiogenic therapy, have shown considerable performance in improving the prognosis of GC. However, new effective therapeutics are also being explored by researchers around the world (15,16). Immunotherapy is a novel and rapidly developing therapy that has been used to treat various cancers (17). Unlike traditional therapies, immunotherapy recognizes and eliminates tumor cells by interfering with tumor immune responses rather than directly acting on tumor cells (18). Immunotherapies, including immune checkpoint inhibitors (ICIs), cancer vaccines, and adoptive cell therapy, have shown promising antitumor efficacy in gastric cancer patients, especially ICIs (17,19). ICIs inhibit tumor growth by avoiding suppression of the antitumor immune response (13). The programmed cell death 1 (PD1)-programmed cell death ligand 1 (PD-L1) axis and cytotoxic T lymphocyte antigen 4 (CTLA4) are major targets of ICIs. Wei et al. showed that PD1 inhibitor ICIs for early-stage patients have the potential to be incorporated into GC (20). Combined with targeted therapies, several ICIs have been approved for clinical application in advanced GC (21). However, two independent PD1 inhibitor-related clinical trials, namely the KEYNOTE-059 trial and ATTRACTION-2 trial, showed positive response rates of 11.6% and 11.2%, respectively (5,22,23). Not all patients benefit from immunotherapy; therefore, it is a priority to identify precise biomarkers and a promising classification of GC to better guide the clinical application of immunotherapy (24,25).

Machine learning is widely used in clinical medicine (26). In addition, machine learning can help researchers to extract relevant features from large multidimensional datasets (27). Over the past several years, machine learning has been used to identify specific biomarkers and generate novel stemness-related subtypes in various cancers, including breast cancer and acute leukemia (28-30). In this study, we generated a stemness index and identified differentially expressed genes (DEGs) using transcriptome analysis. We then identified two stemness subtypes with significant differences in survival and therapeutic effects. We constructed a logistic regression model to guide the stemness-based classification of GC. Our study identified a novel stemness-based classification to better predict the prognosis and treatment response in patients, which could contribute to the clinical treatment of these patients. We present this article in accordance with the TRIPOD reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-24-665/rc).


Methods

Data collection

The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data, including the gene expression, phenotype, and copy number variation (CNV) data, were downloaded from the University of California, Santa Cruz (UCSC) Xena database (https://xenabrowser.net/). The mutation annotation format (MAF) files were downloaded. The phenotypic information of TCGA-STAD samples is displayed in Table 1. GSE84437 samples were used as the verification set, and the corresponding data were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi). The phenotypic information of the GSE84437 samples is displayed in Table 1. The tumor mutation burden (TMB) data of TCGA-STAD samples were downloaded from https://gdc.cancer.gov/about-data/publications/PanCan-CellOfOrigin. The immunoactivity data for TCGA samples were obtained from the appendix of the study by Xu et al. (31). The alternative splicing (AS) data of TCGA samples were obtained from TCGA SpliceSeq Database (https://bioinformatics.mdanderson.org/TCGASpliceSeq/PSIdownload.jsp). Immune cell-related gene sets were downloaded from a List of Pan-cancer Immune Metagenes from a study by Charoentong (32). The study was conducted in accordance with the Declaration of the Helsinki and its subsequent amendments.

Table 1

Statistical table of clinical information for the cohorts analyzed in this study

Variables TCGA-STAD cohort (n=389) GSE84437 cohort (n=433)
Gender
   Male 254 296
   Female 135 137
Age (years)
   >65 208 150
   ≤65 178 283
   NA 3 NA
Neoplasm histologic grade
   G1 10 NA
   G2 138 NA
   G3 232 NA
   GX 9 NA
Pathologic stage
   Stage I 52 NA
   Stage II 121 NA
   Stage III 166 NA
   Stage IV 37 NA
   Discrepancy 8 NA
   NA 5 NA
Pathologic T
   T1 20 11
   T2 82 38
   T3 175 92
   T4 108 292
   TX 4 NA
Cell cluster
   Cluster1 153 NA
   Cluster2 236 NA
mRNAsi group
   High 194 NA
   Low 195 NA
Stemness subtype
   Cluster1 226 280
   Cluster2 163 153
Lauren classification
   Diffuse 62 NA
   Intestinal 166 NA
   Mixed 19 NA
   NA 142 NA
Molecular subtype
   CIN 127 NA
   EBV 25 NA
   GS 51 NA
   MSI 52 NA
   NA 134 NA
Event (OS)
   Alive 233 224
   Dead 156 209
Event (PFI)
   Alive 257 NA
   Dead 132 NA

CIN, chromosomal instability; EBV, Epstein-Barr virus; GS, gestational sac; mRNAsi, mRNA stemness index; MSI, microsatellite instability; NA, not applicable; OS, overall survival; PFI, progression-free interval; T, tumor; TCGA-STAD, The Cancer Genome Atlas Stomach Adenocarcinoma.

Calculation of mRNA stemness index (mRNAsi)

Gene expression information for SCs was extracted from the normalized mRNA matrix of SC downloaded from the Progenitor Cell Biology Consortium (PCBC) database (https://progenitorcells.org/). The OCLR algorithm was then used to generate the stemness signature based on the above expression information for the SC. Subsequently, the Spearman correlation between the weight vectors of the stemness signature and the mRNA expression data of the STAD samples was calculated. Finally, The Spearman correlation coefficient was subtracted from the minimum and was divided by the maximum to map the result in the range of 0 to 1, which was considered the stemness index (mRNAsi) based on the gene expression information.

Correlations between mRNAsi with clinicopathological, molecular and immune characteristics of GC

Subgroup analysis was performed to explore the correlation between mRNAsi and different clinicopathological features, including age, sex, grade, stage, event, Lauren classification, and molecular subtype. Based on the median TMB value, the TCGA samples were divided into high-and TMB low groups. In addition, a correlation analysis was performed between mRNAsi and the most common biomarkers: ATRX, BRAF, EGFR, PTEN, TERT, and TP53.

Based on immune-related genes, single-sample gene set enrichment analysis (ssGSEA) was performed to calculate the enrichment scores of 20 immune cells using the GSVA package in R software. Unsupervised hierarchical clustering was performed to stratify TCGA-STAD patients into different immune clusters using the kmdist algorithm via the ConsensusClusterPlus package in R. Pearson was used as the distance to cluster the patients. Next, the R package Estimate (V1.0.13) was used to calculate the stromal score, immune score, ESTIMATEScore, and TumorPurity of the samples. Subsequently, Pearson’s correlation analysis was performed between mRNAsi and the StromalScore, ImmuneScore, ESTIMATEScore, and TumorPurity. In parallel, differential analyses of the stromal score, immune score, ESTIMATES core, and TumorPurity between different immune clusters were also carried out. Finally, the proportion of immune cell infiltration was calculated using the CIBERSORT (v1.03) package in R. Correlation analysis was performed between mRNAsi and the proportion of immune cell infiltration.

Acquirement of DEGs between mRNAsi high and low groups

The TCGA-STAD samples were stratified into high and low mRNAsi groups. Kaplan-Meier survival analysis was performed to explore the differences in OS and progression-free survival (PFS) between the high and low mRNAsi groups. Subgroup analysis was used to compare the clinicopathological factors between the two groups. The limma package (v3.42.2) in R was used to identify DEGs between the high- and low-mRNAsi groups [P<0.05, |log2 fold change (FC)| >0.585]. Functional enrichment analysis of DEGs, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses, was performed using the clusterProfiler package in R (P<0.01 and q value <0.2). In addition, based on the hallmark gene set, the ssGSEA algorithm was used to generate enrichment scores between the high and low mRNAsi groups.

Based on the AS data of DEGs downloaded from TCGA SpliceSeq database, the total number and co-occurrence frequency of different AS events in the mRNAsi high and low groups were counted to further explore the relationship between mRNAsi and AS. The mutations in DEGs were then counted using the maftools package (v1.0-2) in R. Furthermore, based on the CNV files of TCGA-STAD samples, a Circos plot was drawn to show the CNV information of DEGs.

Differential analysis of two stemness subtypes obtained by unsupervised consensus clustering

Based on the expression matrix of DEGs in TCGA-STAD samples, unsupervised consensus clustering was performed to obtain a stemness-based classification, named the stemness subtype, using the k-means algorithm. Kaplan-Meier survival curves were drawn to show the survival differences between the stemness subtypes. Simultaneously, the GSE84437 dataset was downloaded as a validation set and the stemness index (mRNAsi) of the GSE84437 samples was calculated. The expression matrix of DEGs from the GSE84437 samples was extracted for unsupervised consensus clustering to generate different stemness subtypes.

A subgroup analysis was also performed to compare the clinicopathological parameters between the different stemness subtypes. Based on the Hallmark gene set, a heat map was drawn to show the enrichment scores of the different stemness subtypes obtained using the ssGSEA algorithm. Univariate and multivariate Cox regression analyses were also performed. In the somatic mutation analysis, differences in TMB and the mutational landscape were determined. The CNV frequencies between different stemness subtypes were counted and displayed in the order of chromosome names. For immune-related analysis, a comparison of immunocompetence between different stemness subtypes was performed based on the immunoactivity data of TCGA samples provided by Xu et al. (31). ESTIMATE and CIBERSORT algorithms were used. Differences in the expression of 44 immune checkpoints between the stemness subtypes were then explored. A similar analytical process was performed using the GSE84437 samples for validation.

Prediction of immunotherapy and chemotherapy

Considering that Tumor Immune Dysfunction and Exclusion (TIDE) has good performance in predicting the curative effects of ICIs therapy, the TIDE algorithm was utilized to predict the anti-PD1 and anti-CTLA4 responses of patients with GC, and the TIDE scores were generated by online calculation (http://tide.dfci.harvard.edu). The GenePattern online analysis module SubMap can use gene expression information to merge two datasets with different traits based on functional enrichment, which can not only eliminate the batch effect, but also predict the possible situation of uncounted traits in the original dataset. SubMap was employed to map the TCGA-STAD samples grouped by stemness subtypes to the melanoma samples with the information of anti-PD1 and anti-CTLA4 therapy and to predict the possible effect of anti-PD1 and anti-CTLA4 therapy on different stemness-subtype patients with GC. Moreover, the pRRophetic package (v0.5) in R was used to predict the chemotherapeutic response of patients to BIBW2992, Erlotinib, etoposide, gefitinib, gemcitabine, paclitaxel, and vinorelbine, and the half-maximal inhibitory concentrations (IC50) were calculated and are displayed in the boxplots.

Connectivity Map (CMap) (https://clue.io/) analysis was performed to explore potential compounds associated with different stemness subtypes of GC, which can not only predict compounds based on expression characteristics but also identify their corresponding modes of action (MoAs). First, the DEGs between the different stemness subtypes were obtained using the limma package in R (|log2FC| >1, Padj<0.05). The top 100 genes with positive and negative differential multiples were selected and 200 DEGs were queried using the CMap database. Compounds with negative enrichment scores (P<0.05) were considered potential therapeutic agents for each stemness subtype.

Construction and validation of stemness subtype predictor

The DEGs obtained by high and low mRNAi grouping and stemness subtypes generated by unsupervised consensus clustering were used to screen critical genes significantly related to stemness subtypes. First, 389 samples from the TCGA database were divided by 7:3 via the caret package in R; 70% were in the training set to identify critical genes, and the remaining 30% were used for validation. The least absolute shrinkage and selection operator (LASSO), Random Forest and Boruta (RFB), support vector machine (SVM), and extreme gradient boosting (XGBoost) algorithms were used to screen the stemness-related critical genes using the glmnet (v4.0-2), Boruta (v7.0.0), and caret and xgboost (v 1.4.1.1) packages in R, respectively. Receiver operating characteristic (ROC) curves were drawn to evaluate the performance of the four algorithms for screening critical genes. Finally, the intersecting genes of the four machine learning algorithms were selected as core critical genes. Based on the expression information of the core critical genes, a multiple logistic regression model was constructed using TCGA samples.

Stemnesssubtypepredictor=j=1nexpj×βj

Exp represents the expression level of the corresponding gene, β represents the regression coefficient (coef) of the corresponding gene in the results of logistic regression analysis, stemness subtype predictor represents the sum of the expression level of the core critical gene in each sample multiplied by the coef of the corresponding gene, and j represents the gene.

The stemness subtype predictor of GSE84437 samples was calculated based on a multiple logistic regression model. The ROC curve was then drawn to evaluate the predictive performance of the stemness subtype predictor based on the stemness-based classification obtained by unsupervised consensus clustering. Using the median stemness subtype predictor scores as nodes, the GSE84437 samples were divided into high and low stemness subtype predictor score groups. The survival analysis was performed to explore the survival differences between the two groups. In addition, a heat map was drawn to explore the correlation between the expression of core critical genes and stemness subtype predictor scores, stemness subtypes, and mRNAsi.

Statistical analysis

For intergroup comparisons, normally distributed continuous variables were analyzed using independent Student’s t tests, while categorical variables were assessed with chi-square (χ2) tests. Non-normally distributed or ordinal variables were compared between two groups via the Wilcoxon rank-sum test or across multiple groups using the Kruskal-Wallis test. Correlation analyses were performed to evaluate associations: Pearson’s correlation coefficient was applied to normally distributed variables, and Spearman’s rank correlation coefficient was used for non-normally distributed data. All statistical analyses were conducted in SPSS 22.0 and R 3.6.1, adopting two-tailed tests (P<0.05 for statistical significance).


Results

Calculation of the stemness index (mRNAsi)

TCGA-STAD data were downloaded from the UCSC Xena database. The phenotypic data for TCGA-STAD samples are presented in Table 1. We obtained 389 cancer samples after removing samples with insufficient survival data, and the OS was 0. After selecting the genes expressed in all 389 samples from the expression data, the expression information of 16,619 genes in the coding region was extracted for further analysis.

Using the OCLR algorithm, 78 SC samples containing 12,989 genes in the coding region were obtained from the normalized mRNA Matrix in PCBC database. Subsequently, the mRNAsi scores were calculated based on the steps described in the methods section. Specific information on the mRNAsi is provided in website: https://cdn.amegroups.cn/static/public/10.21037jgo-24-665-1.docx.

Association between clinicopathological features with the stemness index

The correlation between mRNAsi and clinicopathological features was determined based on the calculated mRNAsi scores for all 389 samples (Figure 1A). Based on different clinicopathological features, mRNAsi was compared as shown in Figure 1B. As presented, alive patients had significantly higher mRNAsi scores than dead patients, and a higher mRNAsi score was observed in patients aged >65 years, G2, T1, and I. In the Lauren Classification group, patients with diffuse GC had significantly lower mRNAsi scores than the others. Taken together, mRNAsi has the potential to discriminate between the different clinicopathological classifications of GC.

Figure 1 Clinicopathologic and molecular features associated with the stemness index (mRNAsi) in gastric cancer. (A) Heat map showing the association between mRNAsi and clinicopathologic features. (B) Heat map showing the association between mRNAsi and biomarkers. (C) Comparison of mRNAsi in different groups of clinicopathologic parameters, including event, age, grade, gender, pathologic T, stage, Lauren class and molecular subtype. (D) Comparison of mRNAsi in different groups of biomarkers, including TMB, TP53, TERT, PTEN, ATRX, EGFR and BRAF. ****, P<0.0001; ***, P<0.001; **, P<0.01; *, P<0.05; ns, no significance. mRNAsi, CIN, chromosomal instability; EBV, Epstein-Barr virus; GS, gestational sac; mRNA stemness index; MSI, microsatellite instability; Mut, mutation; T, tumor; TMB, tumor mutation burden; Wt, wild type.

By analyzing the somatic mutation data, the associations between mRNAsi, TMB, and other biomarkers were elucidated (Figure 1C,1D). The higher the TMB, the higher the mRNAsi. Correlation analysis between the most common biomarkers of GC and mRNAsi indicated a significantly higher mRNAsi score in ARTX-mutant- and EGFR-mutant samples.

Correlation between mRNAsi and immunocompetence of GC

In the last few years, the tumor immune microenvironment (TIME) has become a vital factor in the effectiveness of immunotherapy (33). Therefore, we performed further analyses to explore the correlation between mRNAsi and TIME patterns. Based on the 28 immune cell scores calculated using the ssGSEA algorithm, an unsupervised clustering method was used to divide the 389 samples into two clusters. The mRNAsi score of cluster1 was higher than that of cluster2 (Figure 2A). In addition, TIME patterns were characterized using the CIBERSORT and ESTIMATE algorithms. Cluster2 contained a higher proportion of immune cells. The StromalScore, ImmuneScore, ESTIMATEScore of cluster2 were higher than that of cluster1 respectively, while cluster2 had a significantly lower TumorPurity score (Figure 2B). Next, Pearson’s correlations between mRNAsi and StromalScore, ImmuneScore, ESTIMATEScore, and TumorPurity were performed (Figure 2C). mRNAsi negatively correlated with StromalScore, ImmuneScore, and ESTIMATEScore, especially with StromalScore and ESTIMATEScore. In contrast, a positive correlation was observed between the mRNAsi and TumorPurity scores. The differences in StromalScore, ImmuneScore, ESTIMATEScore, and TumorPurity between the two immune subtypes were analyzed (Figure 2D). Significantly higher scores were observed in cluster2, except for the TumorPurity group. In terms of the infiltration of 22 types of immune cells in the two immune subtypes, cluster2 had significantly higher proportions of naïve B cells, memory B cells, T cells CD8, T cells CD4 memory activated, macrophages M1, macrophages M2, mast cells resting than cluster1, oppositely to regulatory T cells (Tregs), macrophages M0, Mast cells activated, and neutrophils (Figure 2E). Twelve types of immune cells positively correlated with mRNAsi, whereas the remaining cells negatively correlated with mRNAsi (Figure 2F). The two most positively correlated immune cells (follicular helper T cells and resting NK cells) and the two most negatively correlated cells (resting mast cells and macrophages M2) are shown in Figure 2G.

Figure 2 Relationship between the immune subtypes of GC classified by the overall immune activity with mRNAsi. (A) Violin plot depicting the mRNAsi in two immune subtypes. (B) Heat map showing 28 kinds of immune cell enrichment scores. (C) Correlation between mRNAsi and the StromalScore, ImmuneScore, ESTIMATEScore and TumorPurity. (D) Violin plots showing the comparison of the StromalScore, ImmuneScore, ESTIMATEScore and TumorPurity in different immune subtypes. (E) Comparisons of the infiltration proportion of immune cells in different immune subtypes. (F) Correlations between mRNAsi and the infiltration proportion of immune cells. Yellow bars indicated correlation coefficient >0, while blue bars indicated correlation coefficient <0. (G) Scalar plots showing the two most positive and negative correlations between mRNAsi and the infiltration proportion of immune cells, including mast cell resting, macrophages M2, T cell follicular helper and NK cells resting. ****, P<0.0001; ***, P<0.001; **, P<0.01; *, P<0.05; ns, no significance. GC, gastric cancer; mRNAsi, mRNA stemness index; TCGA-STAD, The Cancer Genome Atlas Stomach Adenocarcinoma.

Different mRNAsi groups identified the AS of DEGs

The samples were stratified into high and low mRNAsi groups using the median mRNAsi. Survival analysis showed that the high mRNAsi group had a significantly better prognosis [OS/progression-free interval (PFI)] than the low mRNAi group (Figure 3A,3B). Subgroup analysis indicated that the mRNAsi high and low groups had significant differences in the pathologic T and Lauren classifications (Figure 3C).

Figure 3 Clinical features and differential expression analysis between mRNAsi groups. (A,B) Survival analysis in mRNAsi high and low groups. (C) Heat map showing the difference of clinical features between mRNAsi high and low groups. (D) Heat map showing the DEGs in mRNAsi high and low groups. (E) Enrichment analysis of DEGs. Top left, significantly enriched BP. (F) Heat map showing the ssGSEA enrichment scores in mRNAsi high and low groups. (G) Alternative splicing of DEGs. (H) Oncoplot showing 30 most frequently mutated DEGs, altered in 362 across 437 samples. (I) Circos plot showing the differential analysis of CNV in mRNAsi high and low groups. BP, biological process; CIN, chromosomal instability; DEGs, differentially expressed genes; EBV, Epstein-Barr virus; GS, gestational sac; mRNAsi, mRNA stemness index; MSI, microsatellite instability; OS, overall survival; ssGSEA, single-sample gene set enrichment analysis; PFS, progression-free survival.

These observations prompted us to explore the correlation between prognosis and mRNAi expression. Based on differential expression analysis, 1863 DEGs were identified, of which 107 were upregulated and 1756 were downregulated in the mRNAsi high group (Figure 3D). Specific information on the DEGs is shown in website: https://cdn.amegroups.cn/static/public/10.21037jgo-24-665-2.docx. Subsequently, a functional enrichment analysis was performed (Figure 3E). The main significantly enriched BP pathways were extracellular matrix organization, cell-cell adhesion via plasmamembrane adhesion molecules, and homophilic cell adhesion via plasma membrane adhesion molecules. The most significantly enriched CC pathways were collagen-containing extracellular matrix, synaptic membrane, and neuronal cell body. The most significantly enriched MF pathways were extracellular matrix structural constituents, glycosaminoglycan binding, and heparin binding. The calcium signaling pathway and extracellular matrix (ECM)-receptor interactions were the most significantly enriched KEGG pathways. The enrichment differences of the mRNAsi high and low groups in the hallmark pathways were further explored (Figure 3F), showing that most pathways had significant differences between the two groups.

It has been proven that as an essential part in RNA transcription, AS plays an important role in tumorigenesis and progression (34). The AS information of STAD downloaded from the TCGA SpliceSeq database was used to investigate the difference in AS events between the two mRNAsi groups. In total, 12,196 AS events occurred in 1,492 DEGs, among which ES was the most frequent, followed by AP and AT. The low-mRNAsi group had more AS events, and the most frequent consensus AS was the combination of ES and AP, followed by ES and AT. In addition, ES, AP, and AT occurred simultaneously for many of the DEGs (Figure 3G).

Somatic mutation data were further analyzed. Mutations in the DEGs were observed in 437 samples. As shown in Figure 3H, the mutation rate of the 30 most frequently mutated genes was 82.84%, with SYNE1 showing the highest rate. Combined with CNV data, it was determined that the frequencies of both amplification and deletion of the DEGs were high (Figure 3I).

Considerable predictive value for prognosis of the stemness-based classification

To identify a novel classification for patients with GC, unsupervised clustering was performed on TCGA-STAD samples, and two stemness subtypes were identified (Figure 4A). The cumulative distribution function showed that the curve was smoothest when k=2; therefore, k=2 met the classification criteria. Kaplan-Meier survival analysis showed patients in cluster1 had a significantly better prognosis (OS). The heat map indicates that the expression of DEGs was significantly different between the two subtypes, and patients in cluster1 had higher mRNAsi scores. The same analysis was performed on the GSE84437 samples. Likewise, two stemness subtypes were identified (Figure 4B). Similar to the same results as TCGA-STAD samples, patients in cluster1 had a better prognosis and higher mRNAsi expression. Based on TCGA data, subgroup analysis showed significant differences in age, grade, pathologic T, Lauren classification, and mRNAsi group between the two subtypes (Figure 4C). Subsequently, the ssGSEA algorithm was used to explore the differences in the molecular pathways between the two stemness subtypes (Figure 4D). Finally, univariate and multivariate Cox regression analyses were performed to validate the prognostic significance of this novel classification (Figure 4E). Both univariate and multivariate Cox regression analyses showed that age and stemness subtype significantly affected prognosis. In particular, the hazard ratio (HR) of the stemness subtype was high, indicating that the stemness subtype was a relatively independent prognostic factor.

Figure 4 Difference of clinical features and functional annotations between two stemness subtypes identified by unsupervised consensus clustering. (A) Identification of stemness subtypes in TCGA-STAD samples. (B) Identification of stemness subtypes in GSE84437 samples. (C) Difference of clinical features between two stemness subtypes. (D) Heat map showing ssGESA scores in two stemness subtypes. (E) Univariate and multivariate Cox regression analysis illustrating the prognostic factor stemness subtype was relatively independent. CI, confidence interval; CDF, cumulative distribution function; CIN, chromosomal instability; EBV, Epstein-Barr virus; GS, gestational sac; mRNAsi, mRNA stemness index; MSI, microsatellite instability; N, node; NA, not applicable; ssGSEA, single-sample gene set enrichment analysis; T, tumor; TCGA-STAD, The Cancer Genome Atlas Stomach Adenocarcinoma.

Molecular differences between two stemness subtypes

The potential value of genomic alterations on the impact of immunotherapeutic efficacy on cancer has been demonstrated in previous studies (35,36). To further explore the differences in genomic alterations between the two subtypes, the 30 most frequently mutated genes (Figure 5A), CNV burden (Figure 5B,5C), distribution of TMB (Figure 5D), variants (Figure 5E) and the most common biomarkers in the two subtypes (Figure 5F) were counted. The results showed cluster1 had a significantly higher mutation rate, CNV frequency, and TMB. However, no significant differences in the mutation rates of common biomarkers, including ATRX, BRAF, EGRF, TERT, and TP53, were observed between the two stemness subtypes.

Figure 5 Molecular features of two stemness subtypes. (A) Oncoplots showing 30 most frequently mutated DEGs in stemness subtype cluster1 and cluster2. (B) CNV frequency in stemness subtype cluster1 and cluster2. (C) Difference of CNV frequency between two stemness subtypes. (D) Difference of TMB between two stemness subtypes. (E) Difference of variants between two stemness subtypes. (F) Proportion of different genetic mutations in two stemness subtypes, including ATRX, BRAF, EGFR, TERT and TP53. ****, P<0.0001; ***, P<0.001. CNV, copy number variation; DEGs, differentially expressed genes; Mut, mutation; TMB, tumor mutation burden; Wt, wild type.

Immune differences between two stemness subtypes

Based on the data obtained from the immune cell-related gene set, ESTIMATE and CIBERSORT algorithms were used to elucidate the differences in immunocompetence between the two stemness subtypes (Figure 6A). Overall, cluster2 showed higher immunocompetence, whereas different levels of immune activity were observed at different steps between the two stemness subtypes. Regarding the proportion of infiltrating immune cells (Figure 6B), the proportions of naïve B cells, memory B cells, macrophages M2 and mast cells resting were relatively higher in cluster2, oppositely for plasma cells, Tregs, resting NK cells, macrophages M0 and mast cells, and neutrophils. The distinction in TIME patterns between the two subtypes was also considered (Figure 6C). Cluster2 had a lower TumorPurity score and higher StromalScore, ImmuneScore, and ESTIMATEScore than cluster1. The distribution of immune subtype samples in the stemness subtypes was calculated (Figure 6D). Most of the samples from the immune subtype cluster1 belonged to the stemness subtype cluster1, indicating that the immune and stemness subtypes had good consistency. Considering the differences in immunogenomics, the expression differences of 44 immune checkpoints between the two subtypes were examined (Figure 6E). 32 immune checkpoints were significantly differentially expressed between the two subtypes, most of which were expressed at higher levels in cluster2, while there was no significant difference in the expression of CD274 between the two subtypes.

Figure 6 Different tumor immune microenvironment and immunogenomic patterns of two stemness subtype. (A) Analysis of anti-cancer immunity activities in two stemness subtypes. (B) Distinct infiltration proportion of immune cells in two stemness subtypes. (C) Boxplots showing the difference of the StromalScore, ImmuneScore, ESTIMATEScore and TumorPurity between two stemness subtypes. (D) Distribution of immune-subtype samples in stemness subtypes. (E) Different expression levels of immune checkpoints between two stemness subtypes. ****, P<0.0001; ***, P<0.001; **, P<0.01; *, P<0.05; ns, no significance.

Prediction of the response to immunotherapy and chemotherapy in two stemness subtypes

According to previous studies, patients with a high TIDE score have a poorer response to ICIs therapy, and a high TIDE score is associated with low survival rates in patients receiving anti-PD1 and anti-CTLA4 therapy (37,38). Similarly, our result showed cluster2 generally had higher TIDE scores, which was consistent with its poor prognosis of cluster2 (Figure 7A). However, the prediction obtained using the TIDE algorithm showed that only 62 samples responded to ICIs therapy, indicating that ICIs therapy had a poor curative effect on the two stemness subtypes. Subsequently, a subclass mapping analysis was performed to predict the curative effect of anti-PD1 and anti-CTLA4 therapy (Figure 7B). The samples responding to anti-CTLA4 therapy were significantly consistent with those in cluster2, indicating patients in cluster2 may have a better response to anti-CTLA4 therapy. Even though non-responders also had considerable similarity to the samples in cluster2, it did not reach significance (P=0.075). Neither subtype showed a significant response to anti-PD1 therapy. Simultaneously, the responses of the two subtypes were predicted (Figure 7C). The results showed that tolerances to BIBW2992, erlotinib, etoposide, gefitinib, gemcitabine, paclitaxel, and vinorelbine were significantly higher in patients in cluster2.

Figure 7 Analysis of sensitivity to immunotherapy and chemotherapy in two stemness subtypes. (A) Prediction of immunotherapy response between two stemness subtypes based on TIDE. (B) SubMap showing the prediction of therapeutic effect using the immunotherapy data set of melanoma. (C) Analysis of sensitivity to chemotherapy in two stemness subtypes. (D) CMap identifying the potential therapeutic compounds in stemness subtype cluster1 (left panel) and cluster2 (right panel). ****, P<0.0001; ***, P<0.001; **, P<0.01; *, P<0.05; ns, no significance. CTLA4, cytotoxic T lymphocyte antigen 4; noR, non-responder; PD1, programmed cell death 1; R, responder; TIDE, Tumor Immune Dysfunction and Exclusion.

In addition, CMap analysis was performed to identify potential therapeutic drugs for these two subtypes. Firstly, 2,289 DEGs were identified, including 77 upregulated genes in cluster1 and 2,212 up-regulated genes in cluster2. The 77 up-regulated genes in cluster1 and the top 100 upregulated genes in cluster2 were extracted and used in the CMap database. Compounds 1,706 and 2,417 were screened and considered potential therapeutic drugs for the two stemness subtypes. Subsequently, the relationship between the top 20 compounds for the two subtypes and their corresponding MoAs was revealed (Figure 7D). Specific information is shown in Table S1.

Accuracy of the constructed stemness subtype predictor in discriminating two stemness subtypes

LASSO, RFB, SVM, and XGBoost machine learning algorithms were used to identify critical genes from the 1,863 DEGs and extract intersections to construct regression models. The four machine learning algorithms identified 28, 218, 180, and 65 critical genes, respectively, and 12 critical genes were obtained from the intersection (Figure 8A). The ROC curve analysis showed that the calculated AUC values of the four machine learning algorithms were 0.9997, 0.9991, 1, and 0.991, indicating that the identified critical genes were typical (Figure 8B). The expression matrix of 12 critical genes was used for multivariate logistic regression analysis to construct the predictive model: stemness subtype predictor = BOC × 0.8986 + CPE × 1.6464 + CPEB1 × 0.9058 + GHR × 1.1496 + GNAO1 × 1.7398 + NOVA1 × 1.4126 + PCDH9 × 1.7640 + PDE1A × 2.8147 + PDZD4 × 2.9934 + SCN7A × 0.0002 + SETBP1 × (−0.0306) + SFRP1 × 0.2355 + (−93.5303). To evaluate the predictive effect of the stemness subtype predictor, the GSE84437 samples were employed for validation, and the ROC curve was plotted, which showed an AUC value of 0.973, indicating that the logistic regression model constructed using the 12 critical genes had good performance in predicting the stemness subtype (Figure 8C). Meanwhile, samples were divided into high and low stemness subtype score groups by the median. The Kaplan-Meier curves showed significant survival differences between the two groups (Figure 8D). Moreover, a heat map was drawn to show the expression of the 12 core critical genes, revealing that almost all of the 12 critical genes were highly expressed in the high stemness subtype score group, and samples in the high stemness subtype score group had a lower mRNAsi and belonged to stemness subtype cluster2 (Figure 8E).

Figure 8 Construction of the stemness subtype predictor and validation of this predictor using GSE84437 samples. (A) AUC values of four machine learning algorithms. (B) Venn diagram showing the stemness subtype specific genes screened by four machine learning algorithms. (C) ROC curve of stemness subtype predictor in identifying two subtypes. (D) Kaplan-Meier curve showing the classification of stemness subtype predictor. (E) Heat map showing the expression of specific genes in different stemness subtypes. (F) Difference of clinical features between two stemness subtypes in the cohort. (G) Distinct infiltration proportion of immune cells in two stemness subtypes in the cohort. (H) Comparison of the StromalScore, ImmuneScore, ESTIMATEScore and TumorPurity between two stemness subtypes in the cohort. (I) Differential expression levels of immune checkpoints between two stemness subtypes in the cohort. (J) Difference of TIDE score between two stemness subtypes in the cohort. (K) Prediction of sensitivity to chemotherapy between two stemness subtypes in the cohort. ****, P<0.0001; ***, P<0.001; **, P<0.01; *, P<0.05; ns, no significance. AUC, area under the curve; IC50, half-maximal inhibitory concentration; LASSO, least absolute shrinkage and selection operator; mRNAsi, mRNA stemness index; N, node; RFB, Random Forest and Boruta; ROC, receiver operating characteristic; SVM, support vector machine; T, tumor; TIDE, Tumor Immune Dysfunction and Exclusion; XGBoost, extreme gradient boosting.

Validation of the stemness-based classification in another cohort

The GSE84437 dataset was downloaded as an independent GC cohort to further validate the applicability of stemness-based classification. First, some clinicopathological features of the GSE84437 samples were collected and compared (Figure 8F). Except for sex, there were no significant differences in other aspects between the two subtypes. In terms of TIME patterns, cluster2 had a higher proportion of naive B cells, memory T cells CD8, T cells, CD4 memory resting, macrophages M2 and Mast cells resting, while cluster1 had a higher proportion of T cells CD4, memory activated T cells, follicular helper, macrophages M0, macrophages M1 and Neutrophils (Figure 8G). Consistent with the results of the TCGA database, cluster2 in the GSE84437 dataset had a lower TumorPurity score and higher StromalScore, ImmuneScore, and ESTIMATEScore (Figure 8H). Subsequently, the expression differences of 42 immune checkpoints between the two subtypes were determined, of which 22 were significantly differentially expressed (Figure 8I). The TIDE scores of the two subtypes were also calculated (Figure 8J). Cluster2 exhibited a significantly higher TIDE score. Nevertheless, the TIDE results indicated that all samples were non-responders, suggesting that ICIs therapy was ineffective for both stemness subtypes. Finally, response to chemotherapy was predicted (Figure 8K). The same results as in TCGA database showed cluster2 had a high tolerance to BIBW2992, Etoposide, Gefitinib, Gemcitabine, Paclitaxel and Vinorelbine. In addition, tolerance to Erlotinib of cluster1 was significantly higher. Generally, these results showed the similarity between TCGA and GEO databases based on stemness-based classification, and the novel classification could be widely used in the population of GC.


Discussion

The use of machine learning to identify cancer subtypes and predict prognosis and treatment outcomes has become a trend. Some processing biomarkers for breast and colorectal cancers have been identified, but few molecular targets are effective against GC (39). Thus, it is of great significance to identify validated biomarkers or available classifications for GC to predict the prognosis and indicate therapeutic options. Based on TCGA database, our study provided a profound analysis of the relationship between the stemness of GC and therapeutic effects, including immunotherapy and chemotherapy, identified a stemness-based classification of GC for further clinical application, and constructed a Stemness Subtype Predictor to expediently distinguish stemness subtypes.

Several types of models associated with GC stemness have been developed (11), and we intended to identify a stemness-related model based on transcriptomic analysis. The stemness index can accurately reflect the stemness of GC. The higher the mRNAsi score, the higher the stemness. The interaction between stemness and clinicopathological, molecular, and immune features was analyzed based on the generated mRNAsi scores. A significantly better prognosis was observed in the high mRNAi group, which prompted us to gain further molecular insights. In the differential expression analysis, 1,863 DEGs were identified, of which 1,492 genes had AS events. Exon skip (ES) was the most frequent AS event and appeared most frequently in consensus with AS in the present study. AS is an important process involved in transcription (34) and can lead to different arrangements of exon organization from pre-mRNAs, resulting in a variety of mRNAs and the formation of functionally and structurally different protein variants (40). Several studies have indicated that AS abnormalities may serve as potential biomarkers for predicting prognosis and as therapeutic targets for cancers (41,42). Other studies have shown that ES is the most common form of AS (43) and is the most effective in assessing the outcomes of patients with GC (34). This indicates that our stemness-based classification system has considerable potential for assessing prognosis. Further studies are required to verify these findings.

In addition, statistical analysis of the mutation rates of the DEGs revealed that SYNE1 had the highest mutation rate. The spectrin repeat-containing nuclear envelope protein 1 (SYNE1) encodes Syne-1, which plays a role in cerebellar maintenance. Qu et al. discovered that the methylation status of SYNE1 was associated with some clinicopathological parameters in GC, and that patients with a high level of SYNE1 promoter methylation had a poor chemotherapy response (44). This gives us a clue to further explore whether the high mutation rate of SYNE1 is related to poor response to current chemotherapy.

In this study, we also identified two stemness subtypes by unsupervised clustering based on DEGs and found that patients in stemness subtype cluster1 had significantly higher mRNAsi scores and better prognoses, indicating that this type of classification might be related to the therapeutic effect. Therefore, we conducted an in-depth analysis of the relationship between stemness subtypes, immune-related biomarkers, and immune checkpoints, such as TMB and CD274. The TMB, the total number of somatic mutations per megabase of a genome, is considered a promising biomarker for predicting the efficacy of immunotherapy in many cancers, including GC (18,25,45,46). In our study, a significantly higher TMB was observed in the stemness subtype cluster1, which indicated that TMB might have a promising role in explaining the different immunotherapy responses of the two subtypes. As for immune checkpoints, most of immune checkpoints had significant differences between two subtypes, while CD274, the most common one related to immunotherapy, did not show a significant correlation with stemness subtypes. This may be due to incomplete information and inconsistent definitions of CD274 positivity. TIME changes continuously during cancer progression (20) and plays a crucial role in immune response. We compared the proportions of immune cell infiltration between the two stemness subtypes. Stemness subtypes cluster2 had relatively higher proportions of naïve B cells, memory B cells, macrophages M2 and mast cells resting, while cluster1 had more plasma cells, Tregs, resting NK cells, macrophages M0 and mast cells, and neutrophils. Tumor-associated macrophages (TAMs) are antigen-presenting cells, and different TAMs have different effects on tumor development (20); which macrophages M2 are associated with promoting tumor growth, invasion, and metastasis in many cancers (47). In our study, the higher infiltration of macrophages M2 and other immunosuppressive cells might explain the poorer prognosis of the stemness subtype cluster2, even though cluster2 had a higher overall immunocompetence.

To further explore the ability of stemness subtypes to predict treatment outcomes, we used multiple methods of analysis, including the TIDE algorithm, GenePattern online analysis, and the pRRophetic package in R. TIDE was developed based on two mechanisms of tumor immune escape: the induction of T cell dysfunction in tumors with high infiltration of cytotoxic T lymphocytes (CTLs) and the prevention of T cell infiltration in tumors with low CTL levels. And it has been confirmed the TIDE algorithm can predict the response of immunotherapy by analyzing the expression information of cancer patients (37). Our study found that stemness subtype cluster2 had higher TIDE scores, as expected, but given the small responder size, this result might require further research for confirmation. The other two analytical methods also yielded valuable results demonstrating the potential of stemness subtypes in predicting treatment outcomes.

Finally, a stemness subtype predictor was constructed to make the stemness subtype more convenient for clinical application. It comprises 12 genes: BOC, CPE, CPEB1, GHR, GNAO1, NOVA1, PCDH9, PDE1A, PDZD4, SCN7A, SETBP1, and SFRP1. Many studies have shown that most of these genes are potential biomarkers of GC. Brothers of cysteine dioxygenase (BOC), an immunoglobulin superfamily member, promotes sonic hedgehog (SHH) signaling (48,49). Low BOC expression is related to advanced stages of GC (50). Cytoplasmic polyadenylation element binding protein 1 (CPEB1) is a crucial factor that regulates mRNA translation and is a cancer suppressor gene (51-53). Caldeira et al. elucidated that hypermethylation of CPEB1 is common in GC and that CPEB1 might play a vital role in hindering invasion and angiogenesis (54). The growth hormone receptor (GHR) is generally believed to play a growth-promoting role when combined with the growth hormone (GH). A recent study showed that in GC cells, GH/GHR could be transported into the nucleus, and the GHR in the nucleus correlated with cell proliferation, which could be interrupted by GHR inhibitors (55). The G protein subunit alpha O1 (GNAO1) not only activates second messengers but also regulates transcription factors. Liu et al. reported that the overexpression of GNAO1 is associated with poor prognosis in patients with GC (56). As for secreted frizzledrelated protein 1 (sFRP1), a Wnt signaling antagonist, it can interfere the combination of frizzled membrane receptors (Fzs) and Wnt and then affect the expression of downstream genes (57). Zhang et al. found that low sFRP1 expression indicates poor prognosis and histological grades of GC (58). In addition, diminished neuro-oncological ventral antigen 1 (NOVA1) expression in GC cells is related to poor prognosis and tumor progression (59). As for carboxypeptidase E (CPE), it is a neuropeptide-processing enzyme mainly located in the endocrine and nervous systems (60). CPE also has many non-enzymatic functions, particularly in several major cancers, including colon, breast, and pancreatic cancer (61). Though the association between CPE and GC has rarely been explored, the necessity of CPE for gastric function may suggest a correlation between CPE and gastric cancer (62). PDZD4, a PDZ domain containing 4, has low regional specificity in the human brain. It is a cancer-related gene and prognostic marker for pancreatic cancer. Similarly, the role of phosphodiesterase 1A (PDE1A) in several cancers has been illustrated. However, the correlation between PDE1A and GC remains unknown (63). Considering the association between these genes and GC, the Stemness Subtype Predictor may be an accessible biomarker for discriminating between the two stemness subtypes of GC and predicting the prognosis and treatment outcomes of GC. The main advantage of using machine learning for the classification of malignant tumors lies in its ability to fully utilize sequencing information from the cohort, thereby establishing a relatively reliable and effective risk stratification system. However, there are also some aspects that are less satisfactory. For instance, we hope to have a larger cohort to enrich our study.

However, this study has some limitations. The samples for both identifying the stemness-based classification and validation were retrospective, and the total number of samples was relatively small, which might have caused overfitting of the classification; prospective studies with larger samples are needed for further verification. All samples were obtained from the TCGA and GEO databases, so some supplementary studies containing clinical samples are expected. Moreover, the TIDE scores were mainly used in melanoma and non-small cell lung cancer, and the prediction accuracy was likely to be poor for gastric cancer; therefore, the results presented in this study may only be used as a reference.


Conclusions

In conclusion, based on the stemness of GC, we conducted a stemness-based classification of GC for the prediction of patient outcomes and therapy responses and generated a stemness subtype predictor to make the classification more clinically feasible. This study may provide a novel method to predict prognosis and assist in guiding treatment options for patients with GC, and may be beneficial for further exploration.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jgo.amegroups.com/article/view/10.21037/jgo-24-665/rc

Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-24-665/prf

Funding: This work was supported by grants from the National Natural Science Foundation of China (No. 82172803) and 2020 Zhongshan Hospital Clinical Research Special Fund (No. 2020ZSLC15).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-24-665/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
  2. Shang W, Liang X, Li S, et al. Orphan nuclear receptor Nurr1 promotes Helicobacter pylori-associated gastric carcinogenesis by directly enhancing CDK4 expression. EBioMedicine 2020;53:102672. [Crossref] [PubMed]
  3. Li F, Li J, Yu J, et al. Identification of ARGLU1 as a potential therapeutic target for gastric cancer based on genome-wide functional screening data. EBioMedicine 2021;69:103436. [Crossref] [PubMed]
  4. Zhang F, Wang H, Yu J, et al. LncRNA CRNDE attenuates chemoresistance in gastric cancer via SRSF6-regulated alternative splicing of PICALM. Mol Cancer 2021;20:6. [Crossref] [PubMed]
  5. Sexton RE, Al Hallak MN, Diab M, et al. Gastric cancer: a comprehensive review of current and future treatment strategies. Cancer Metastasis Rev 2020;39:1179-203. [Crossref] [PubMed]
  6. Jin G, Lv J, Yang M, et al. Genetic risk, incident gastric cancer, and healthy lifestyle: a meta-analysis of genome-wide association studies and prospective cohort study. Lancet Oncol 2020;21:1378-86. [Crossref] [PubMed]
  7. Song Q, Lv X, Ru Y, et al. Circulating exosomal gastric cancer-associated long noncoding RNA1 as a noninvasive biomarker for predicting chemotherapy response and prognosis of advanced gastric cancer: A multi-cohort, multi-phase study. EBioMedicine 2022;78:103971. [Crossref] [PubMed]
  8. Jiang Y, Liu W, Li T, et al. Prognostic and Predictive Value of p21-activated Kinase 6 Associated Support Vector Machine Classifier in Gastric Cancer Treated by 5-fluorouracil/Oxaliplatin Chemotherapy. EBioMedicine 2017;22:78-88. [Crossref] [PubMed]
  9. Zhu L, Wang H, Jiang C, et al. Clinically applicable 53-Gene prognostic assay predicts chemotherapy benefit in gastric cancer: A multicenter study. EBioMedicine 2020;61:103023. [Crossref] [PubMed]
  10. Malta TM, Sokolov A, Gentles AJ, et al. Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation. Cell 2018;173:338-354.e15. [Crossref] [PubMed]
  11. Jiang Q, Chen H, Tang Z, et al. Stemness-related LncRNA pair signature for predicting therapy response in gastric cancer. BMC Cancer 2021;21:1067. [Crossref] [PubMed]
  12. Katoh M, Katoh M. WNT signaling and cancer stemness. Essays Biochem 2022;66:319-31. [Crossref] [PubMed]
  13. Wang Z, Wang Y, Yang T, et al. Machine learning revealed stemness features and a novel stemness-based classification with appealing implications in discriminating the prognosis, immunotherapy and temozolomide responses of 906 glioblastoma patients. Brief Bioinform 2021;22:bbab032. [Crossref] [PubMed]
  14. Chen X, Zhang D, Jiang F, et al. Prognostic Prediction Using a Stemness Index-Related Signature in a Cohort of Gastric Cancer. Front Mol Biosci 2020;7:570702. [Crossref] [PubMed]
  15. Wen Z, Chen M, Guo W, et al. RORβ suppresses the stemness of gastric cancer cells by downregulating the activity of the Wnt signaling pathway. Oncol Rep 2021;46:180. [Crossref] [PubMed]
  16. Mariette C, Renaud F, Piessen G, et al. The FREGAT biobank: a clinico-biological database dedicated to esophageal and gastric cancers. BMC Cancer 2018;18:139. [Crossref] [PubMed]
  17. Jin X, Liu Z, Yang D, et al. Recent Progress and Future Perspectives of Immunotherapy in Advanced Gastric Cancer. Front Immunol 2022;13:948647. [Crossref] [PubMed]
  18. Kwak Y, Seo AN, Lee HE, et al. Tumor immune response and immunotherapy in gastric cancer. J Pathol Transl Med 2020;54:20-33. [Crossref] [PubMed]
  19. Wei L, Sun J, Zhang N, et al. Noncoding RNAs in gastric cancer: implications for drug resistance. Mol Cancer 2020;19:62. [Crossref] [PubMed]
  20. Wei Y, Zhang J, Fan X, et al. Immune Profiling in Gastric Cancer Reveals the Dynamic Landscape of Immune Signature Underlying Tumor Progression. Front Immunol 2022;13:935552. [Crossref] [PubMed]
  21. Takei S, Kawazoe A, Shitara K. The New Era of Immunotherapy in Gastric Cancer. Cancers (Basel) 2022;14:1054. [Crossref] [PubMed]
  22. Fuchs CS, Doi T, Jang RW, et al. Safety and Efficacy of Pembrolizumab Monotherapy in Patients With Previously Treated Advanced Gastric and Gastroesophageal Junction Cancer: Phase 2 Clinical KEYNOTE-059 Trial. JAMA Oncol 2018;4:e180013. [Crossref] [PubMed]
  23. Kang YK, Boku N, Satoh T, et al. Nivolumab in patients with advanced gastric or gastro-oesophageal junction cancer refractory to, or intolerant of, at least two previous chemotherapy regimens (ONO-4538-12, ATTRACTION-2): a randomised, double-blind, placebo-controlled, phase 3 trial. Lancet 2017;390:2461-71.
  24. Högner A, Moehler M. Immunotherapy in Gastric Cancer. Curr Oncol 2022;29:1559-74. [Crossref] [PubMed]
  25. Zeng D, Wu J, Luo H, et al. Tumor microenvironment evaluation promotes precise checkpoint immunotherapy of advanced gastric cancer. J Immunother Cancer 2021;9:e002467. [Crossref] [PubMed]
  26. Liu T, Huang J, Liao T, et al. A Hybrid Deep Learning Model for Predicting Molecular Subtypes of Human Breast Cancer Using Multimodal Data. IRBM 2022;43:62-74.
  27. Errington N, Iremonger J, Pickworth JA, et al. A diagnostic miRNA signature for pulmonary arterial hypertension using a consensus machine learning approach. EBioMedicine 2021;69:103444. [Crossref] [PubMed]
  28. Sherafatian M. Tree-based machine learning algorithms identified minimal set of miRNA biomarkers for breast cancer diagnosis and molecular subtyping. Gene 2018;677:111-8. [Crossref] [PubMed]
  29. Thalor A, Kumar Joon H, Singh G, et al. Machine learning assisted analysis of breast cancer gene expression profiles reveals novel potential prognostic biomarkers for triple-negative breast cancer. Comput Struct Biotechnol J 2022;20:1618-31. [Crossref] [PubMed]
  30. Zhang Y, Liu D, Li F, et al. Identification of biomarkers for acute leukemia via machine learning-based stemness index. Gene 2021;804:145903. [Crossref] [PubMed]
  31. Xu L, Deng C, Pang B, et al. TIP: A Web Server for Resolving Tumor Immunophenotype Profiling. Cancer Res 2018;78:6575-80. [Crossref] [PubMed]
  32. Charoentong P, Finotello F, Angelova M, et al. Pan-cancer Immunogenomic Analyses Reveal Genotype-Immunophenotype Relationships and Predictors of Response to Checkpoint Blockade. Cell Rep 2017;18:248-62. [Crossref] [PubMed]
  33. Di Blasio S, van Wigcheren GF, Becker A, et al. The tumour microenvironment shapes dendritic cell plasticity in a human organotypic melanoma culture. Nat Commun 2020;11:2749. [Crossref] [PubMed]
  34. Wei C, Xie W, Huang X, et al. Profiles of alternative splicing events in the diagnosis and prognosis of Gastric Cancer. J Cancer 2021;12:2982-92. [Crossref] [PubMed]
  35. Chen X, Wang D, Liu J, et al. Genomic alterations in biliary tract cancer predict prognosis and immunotherapy outcomes. J Immunother Cancer 2021;9:e003214. [Crossref] [PubMed]
  36. Kalaora S, Nagler A, Wargo JA, et al. Mechanisms of immune activation and regulation: lessons from melanoma. Nat Rev Cancer 2022;22:195-207. [Crossref] [PubMed]
  37. Jiang P, Gu S, Pan D, et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat Med 2018;24:1550-8. [Crossref] [PubMed]
  38. Song D, Yang Q, Li L, et al. Novel prognostic biomarker TBC1D1 is associated with immunotherapy resistance in gliomas. Front Immunol 2024;15:1372113. [Crossref] [PubMed]
  39. Liu X, Meltzer SJ. Gastric Cancer in the Era of Precision Medicine. Cell Mol Gastroenterol Hepatol 2017;3:348-58. [Crossref] [PubMed]
  40. Wang X, Li J, Bian X, et al. CircURI1 interacts with hnRNPM to inhibit metastasis by modulating alternative splicing in gastric cancer. Proc Natl Acad Sci U S A 2021;118:e2012881118. [Crossref] [PubMed]
  41. Oltean S, Bates DO. Hallmarks of alternative splicing in cancer. Oncogene 2014;33:5311-8. [Crossref] [PubMed]
  42. Cheng R, Xu Z, Luo M, et al. Identification of alternative splicing-derived cancer neoantigens for mRNA vaccine development. Brief Bioinform 2022;23:bbab553. [Crossref] [PubMed]
  43. Blencowe BJ. Alternative splicing: new insights from global analyses. Cell 2006;126:37-47. [Crossref] [PubMed]
  44. Qu Y, Gao N, Wu T. Expression and clinical significance of SYNE1 and MAGI2 gene promoter methylation in gastric cancer. Medicine (Baltimore) 2021;100:e23788. [Crossref] [PubMed]
  45. You W, Ouyang J, Cai Z, et al. Comprehensive Analyses of Immune Subtypes of Stomach Adenocarcinoma for mRNA Vaccination. Front Immunol 2022;13:827506. [Crossref] [PubMed]
  46. Joshi SS, Badgwell BD. Current treatment and recent progress in gastric cancer. CA Cancer J Clin 2021;71:264-79. [Crossref] [PubMed]
  47. Chen Y, Li ZY, Zhou GQ, et al. An Immune-Related Gene Prognostic Index for Head and Neck Squamous Cell Carcinoma. Clin Cancer Res 2021;27:330-41. [Crossref] [PubMed]
  48. Garcia ADR, Han YG, Triplett JW, et al. The Elegance of Sonic Hedgehog: Emerging Novel Functions for a Classic Morphogen. J Neurosci 2018;38:9338-45. [Crossref] [PubMed]
  49. Kang JS, Mulieri PJ, Hu Y, et al. BOC, an Ig superfamily member, associates with CDO to positively regulate myogenic differentiation. EMBO J 2002;21:114-24. [Crossref] [PubMed]
  50. Fattahi S, Nikbakhsh N, Ranaei M, et al. Association of sonic hedgehog signaling pathway genes IHH, BOC, RAB23a and MIR195-5p, MIR509-3-5p, MIR6738-3p with gastric cancer stage. Sci Rep 2021;11:7471. [Crossref] [PubMed]
  51. Nagaoka K, Fujii K, Zhang H, et al. CPEB1 mediates epithelial-to-mesenchyme transition and breast cancer metastasis. Oncogene 2016;35:2893-901. [Crossref] [PubMed]
  52. Xiaoping L, Zhibin Y, Wenjuan L, et al. CPEB1, a histone-modified hypomethylated gene, is regulated by miR-101 and involved in cell senescence in glioma. Cell Death Dis 2013;4:e675. [Crossref] [PubMed]
  53. Shoshan E, Mobley AK, Braeuer RR, et al. Reduced adenosine-to-inosine miR-455-5p editing promotes melanoma growth and metastasis. Nat Cell Biol 2015;17:311-21. [Crossref] [PubMed]
  54. Caldeira J, Simões-Correia J, Paredes J, et al. CPEB1, a novel gene silenced in gastric cancer: a Drosophila approach. Gut 2012;61:1115-23. [Crossref] [PubMed]
  55. Meng Y, Zhou B, Pei Z, et al. The nuclear-localized GHR is involved in the cell proliferation of gastric cancer, and pegvisomant may be an important potential drug to inhibit the proliferation of gastric cancer cells. Biochem Cell Biol 2022;100:125-35. [Crossref] [PubMed]
  56. Liu Z, Zhang J, Wu L, et al. Overexpression of GNAO1 correlates with poor prognosis in patients with gastric cancer and plays a role in gastric cancer cell proliferation and apoptosis. Int J Mol Med 2014;33:589-96. [Crossref] [PubMed]
  57. Peng JX, Liang SY, Li L. sFRP1 exerts effects on gastric cancer cells through GSK3β/Rac1-mediated restraint of TGFβ/Smad3 signaling. Oncol Rep 2019;41:224-34. [Crossref] [PubMed]
  58. Zhang T, Wu Y, Fang Z, et al. Low expression of RBMS3 and SFRP1 are associated with poor prognosis in patients with gastric cancer. Am J Cancer Res 2016;6:2679-89.
  59. Kim EK, Yoon SO, Jung WY, et al. Implications of NOVA1 suppression within the microenvironment of gastric cancer: association with immune cell dysregulation. Gastric Cancer 2017;20:438-47. [Crossref] [PubMed]
  60. Manser E, Fernandez D, Loo L, et al. Human carboxypeptidase E. Isolation and characterization of the cDNA, sequence conservation, expression and processing in vitro. Biochem J 1990;267:517-25. [Crossref] [PubMed]
  61. Yang X, Lou H, Chen YT, et al. A novel 40kDa N-terminal truncated carboxypeptidase E splice variant: cloning, cDNA sequence analysis and role in regulation of metastatic genes in human cancers. Genes Cancer 2019;10:160-70. [Crossref] [PubMed]
  62. Gomez P, Hallberg L, Greeley GH Jr, Carboxypeptidase E. CPE) deficiency in mice with the fat mutation have reduced stomach function. Proc Soc Exp Biol Med 1999;220:52-3. [Crossref] [PubMed]
  63. Samidurai A, Xi L, Das A, et al. Role of phosphodiesterase 1 in the pathophysiology of diseases and potential therapeutic opportunities. Pharmacol Ther 2021;226:107858. [Crossref] [PubMed]
Cite this article as: Zhou S, Tian C, Zhu T, Chen H, Chen C, Jiang Q, Liu F. Stemness-based gastric cancer classification by machine learning for precision diagnosis and treatment of gastric cancer. J Gastrointest Oncol 2025;16(5):1902-1923. doi: 10.21037/jgo-24-665

Download Citation