A single-cell and machine learning framework identifies CAFs-associated signatures linking stromal heterogeneity to immune regulation in pancreatic cancer
Highlight box
Key findings
• A cancer-associated fibroblasts (CAFs)-associated signature (CAFAS) was developed and showed robust prognostic performance across multiple independent pancreatic ductal adenocarcinoma (PDAC) cohorts.
What is known and what is new?
• CAFs are major components of the tumor microenvironment in PDAC and contribute to tumor progression.
• This study provides a single-cell-based CAFs classification and a machine learning-derived CAFAS model linking CAFs heterogeneity to prognosis and immune status.
What is the implication, and what should change now?
• CAFAS may support precision prognostic assessment and immune stratification in PDAC.
Introduction
The occurrence and progression of pancreatic cancer depend not only on the intrinsic characteristics of the tumor cells, but also on the dynamic interactions and co-evolution between the tumor and tumor microenvironment (TME) (1). Pancreatic cancer is distinguished by an abundant and dense extracellular matrix, within which cancer-associated fibroblasts (CAFs) constitute the predominant stromal cell population and play indispensable roles at various stages of tumor development (2). Besides directly interacting with other cell types, CAFs secrete a wide range of cytokines, metabolites and extracellular matrix proteins that collectively regulate diverse malignant phenotypes of cancer cells, including proliferation, invasion, metabolic reprogramming, immune evasion, and chemoresistance (3). Consequently, increasing evidence has highlighted the potential value of CAFs as prognostic biomarkers and therapeutic targets in pancreatic cancer. However, CAFs display substantial heterogeneity in their origins, phenotypes, functions, and abundance. Distinct CAFs subtypes may exert entirely opposite effects on pancreatic cancer progression, complicating both prognostic prediction and the development of effective CAFs-targeted therapies (4).
CAFs contribute to the immunosuppressive TME of pancreatic cancer by forming physical barriers, recruiting and reprogramming immune cells, and suppressing T-cell activity (5,6). Based on molecular characteristics, CAFs have been broadly categorized into three major subtypes: inflammatory CAFs (iCAFs), myofibroblastic CAFs (myCAFs), and antigen-presenting CAFs (apCAFs) (7). Preliminary studies have demonstrated that iCAFs, myCAFs, and apCAFs can promote tumor progression and mediate immunosuppression (8-10). Nevertheless, recent reports indicate that a subset of CAFs sharing a mesothelial origin and exhibiting transcriptional overlap with apCAFs may exert antitumor immune functions (11). These findings underscore the complexity of CAFs heterogeneity in shaping tumor immunoregulation and influencing responses to immunotherapy in pancreatic cancer.
Given the remarkable heterogeneity of CAFs and their pivotal roles in pancreatic cancer progression and immune modulation, exploring CAFs-associated molecular features at higher resolution through single-cell analysis is essential for understanding tumor biology and therapeutic response. In this study, we established a robust CAFs-associated signature (CAFAS) by systematically benchmarking seven survival algorithms. CAFAS demonstrated superior prognostic performance across multiple independent cohorts. Moreover, CAFAS effectively delineated the immune landscape of pancreatic cancer, exhibiting strong discriminative power in predicting immune status and potential responses to immunotherapy. Collectively, these findings highlight the significance of CAFs heterogeneity in shaping tumor immunity and underscore the utility of CAFAS as a novel tool for prognosis and immunotherapy stratification in pancreatic cancer. We present this article in accordance with the TRIPOD reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2026-0297/rc).
Methods
Data source and processing
The single-cell datasets GSE212966 and CRA001160 for pancreatic ductal adenocarcinoma (PDAC) were retrieved, comprising a total of 44 samples. Totally five PDAC patient cohorts, including TCGA-PDAC, GSE57495, GSE71729, GSE28735, Pancreatic Cancer-Australia cohort (PACAAU, https://pancreasexpression.org/analytics/icgc/), which contained overall survival (OS) information and expression data were collected. Due to the presence of batch effects across cohorts, we used the ‘combat’ function from the ’sva’ R package to correct for these effects and merge the datasets into a single meta-cohort.
During the processing of single-cell RNA sequencing (scRNA-seq) data, we retained high-quality cells with mitochondrial gene content below 20% and 200–7,000 detected genes per cell. Genes detected in at least three cells were retained for further analysis. This filtering resulted in 107,791 eligible cells for further analysis. Integration was performed using the Seurat pipeline (12). The remaining cells were scaled and normalized using a linear regression model with the “Log-normalization” method, identifying the top 3,000 highly variable genes via the “FindVariableFeatures” function. Dimensionality reduction was achieved through principal component analysis (PCA). To eliminate batch effects among the samples, soft k-means clustering was applied using the “Harmony” package (13). The “FindClusters” function was used for cell clustering, which was visualized using UMAP. Cell types were annotated based on previous studies and manually verified. This process involved identifying genes with higher expression levels, distinct expression patterns, and recognized canonical markers. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Identify and cluster CAFs
Fibroblasts annotated by canonical marker genes were extracted and subjected to secondary clustering to resolve CAF heterogeneity. The FindNeighbors and FindClusters were applied, followed by visualization with UMAP. Differentially expressed marker genes for each CAF cluster were identified using FindAllMarkers, with each cluster compared against all other clusters.
Machine learning benchmark for construction of CAFAS
First, the prognosis-related CAFs clusters were selected, and the differential expressed genes of these CAFs clusters were subjected to univariate Cox regression analysis. Subsequently, the LASSO regression method was employed to narrow down the pool of candidate genes.
As previous research has described (14), To develop an accurate and robust CAFAS, we benchmarked seven machine learning algorithms including lasso, elastic network (Enet), ridge, partial least squares regression for Cox (plsRcox), CoxBoost, eXtreme Gradient Boosting survival (XGBoost), and supervised principal components (SuperPC) using nested cross-validation (CV) in the TCGA-PDAC cohort. In this approach, we optimized each algorithm’s hyperparameters within an inner fivefold CV and assessed the performance of the best-tuned models via an outer tenfold CV. We then conducted a comprehensive evaluation by averaging Harrell’s concordance index (C-index) and integrated Brier score (IBS) across the 10 testing folds. The model with the highest average performance metrics was selected as optimal.
Trajectory analysis and cell-cell communication
We utilized Monocle2 (15) to infer cell trajectories, deducing the differentiation pathway using standard parameters. Additionally, we conducted CytoTRACE analysis with default settings (16), an algorithm that predicts differentiation states from scRNA-seq data. This approach is based on the robust observation that transcriptional diversity decreases during differentiation, providing a complementary perspective to the trajectory analysis from Monocle2. To assess variations in theorized cell-cell communication modules, we combined gene expression data using CellChat package (17). We used the default CellChatDB as the ligand-receptor database, in accordance with the normal CellChat process. By first identifying overexpressed ligands or receptors within a cell group and then seeing enhanced ligand-receptor interactions when these ligands or receptors were overexpressed, cell type-specific interactions were deduced. The “SCENIC” program (18), which discovers networks based on the co-expression of gene regulons and DNA motifs, was also used to investigate gene regulatory networks. The primary objective of this study was to investigate CAFs-associated interactions within the TME, the ductal cells was retained as a tumor-sample-derived ductal population for CellChat analysis.
Gene set scoring
The “GSVA” package was used to conduct single-sample gene set enrichment analysis (ssGSEA) for gene set scoring in bulk RNA sequencing data (19). GSVA was also employed to evaluate previously established CAFs subtypes in scRNA-seq data (20).
Survival analyses
We used the “GSVA” package to calculate gene signature scores for CAFs subtypes across all PDAC cohorts. The association between CAFs-related signatures and patient prognosis was analyzed using the log-rank test and Cox proportional hazards regression. Kaplan-Meier curves were plotted, and optimal cutoff values for different CAFs signatures in the PDAC cohorts were determined using the “survminer” package.
Prediction of therapeutic targets and agents
The expression profile data of human cancer cell lines (CCLs) were obtained from the Broad Institute Cancer Cell Line Encyclopedia project (CCLE, https://portals.broadinstitute.org/ccle/) (21). The dependency map portal (DepMap, https://depmap.org/portal/) provided the CERES scores, which were based on genome-wide CRISPR knockdown screenings that included 18,333 genes in 739 cell lines. CERES scores indicate gene dependency in a specific CCLs, with lower scores reflecting a higher likelihood of the gene being essential for the CCL’s proliferation and survival. To estimate CAFAS-related transcriptional activity in pancreatic CCLs, the CAFAS score for each cell line was calculated by applying the same gene coefficients and formula derived from the Ridge survival model in the patient cohort to the normalized expression matrix of CCLE pancreatic CCLs. Given that many proteins lack binding sites or sufficient affinity for small molecules or antibodies, we identified 2,249 druggable targets from a prior publication (22). Potential candidate agents for PDAC patients with high CAFAS scores were identified from the Cancer Therapeutics Response Portal (CTRP, https://portals.broadinstitute.org/ctrp) and the Profiling Relative Inhibition Simultaneously in Mixtures (PRISM, https://depmap.org/portal/prism) databases. Drug response was quantified using area under the dose–response curve (AUC) values, where lower AUC values indicate higher drug sensitivity. Additionally, using the Genomics of Drug Sensitivity in Cancer (GDSC, https://www.cancerrxgene.org/) database, chemotherapy responses for each sample were predicted with the “oncoPredict” package, which calculates drug sensitivity values measured with the half-maximal inhibitory concentration (IC50).
Immune-cell infiltration
Using the Immuno-Oncology Biological Research (IOBR) package, we gathered several previously published signatures pertaining to immunotherapy responses and TME cell types (23). We then used a unified method to calculate the enrichment score for each sample, thoroughly analyzing the immunological differences between patients with high and low CAFAS. To estimate the immunotherapy response, we employed the tracking tumor immunophenotype (TIP) algorithm, the subclass mapping, and the tumor immune dysfunction and exclusion (TIDE) method. The “estimate” R package was utilized to calculate immune, stromal, and ESTIMATE scores for patients with PDAC (24).
Statistical analysis
Data processing, statistical testing, and visualization were performed using R software (version 4.2.1). Spearman correlation analysis was used to assess associations between continuous variables, while chi-square tests were applied for categorical variables. Group comparisons of continuous variables were conducted using Wilcoxon rank-sum tests or Student’s t-tests, depending on data distribution. Parameters for R packages are detailed in the respective sections, with default settings applied unless otherwise noted. All tests were two-sided, with P<0.05 considered statistically significant.
Results
The single-cell RNA profiling of pancreatic cancer
The overall workflow of this study is shown in the schematic diagram (Figure 1). This study analyzed 44 samples (14 normal samples and 30 pancreatic cancer samples), each demonstrating a stable cell distribution, indicating minimal impact from batch effects. These samples provided a solid basis for further analysis (Figure 2A). Using the UMAP algorithm, the cells were classified into 33 distinct clusters (Figure 2B). Based on canonical marker gene expression, the integrated single-cell atlas was annotated into 12 major cell types, including T cells, ductal cells, endothelial cells, and fibroblasts (Figure 2C). Figure 2D illustrates the expression patterns of marker genes for each of the cell clusters through comprehensive bubble plots. The identification of cell types was guided by representative marker genes highlighted in Figure 2E. To evaluate the distribution of these cell types across the 44 samples, their proportions are shown in Figure 2F. Notably, fibroblast cells were found to be more prevalent in PDAC compared to normal tissues, which implies that CAFs exert a potentially significant role in the progression of PDAC.
To further investigate CAFs in PDAC, we categorized them into 5 distinct clusters (Figure 2G). Our analysis identified 5 unique subpopulations of CAFs, including CCN2+ATP5F1E+ CAFs, PLA2G2A+APOD+ CAFs, ATP5E+PRKCDBP+ CAFs, ADM+ALDOA+ CAFs, and DNAJB1+HSPA1A+ CAFs (Figure 2H, table available at https://cdn.amegroups.cn/static/public/jgo-2026-0297-1.xlsx). Subsequently, Figure 2I exhibits the differentially expressed genes (DEGs) for each CAFs cluster, providing insights into the distinct gene expression patterns within these clusters. GO analysis revealed that the DEGs of each CAFs subpopulation were significantly enriched in different biological functions (Figure 2J). For instance, DEGs of CCN2+ATP5F1E+ CAFs are enriched in connective tissue development and extracellular matrix organization, while DEGs of PLA2G2A+APOD+ CAFs are associated with negative regulation of lipid metabolic process and protein catabolic process. DEGs of ATP5E+PRKCDBP+ CAFs are enriched in regulation of serine−type endopeptidase activity and regulation of serine−type peptidase activity, DEGs of ADM+ALDOA+ CAFs are involved in glycolytic process and ADP metabolic process, and DEGs of DNAJB1+HSPA1A+ CAFs are enriched in response to unfolded protein and response to topologically incorrect protein.
Trajectory analysis and cellular communication
Given the presence of cells at various developmental stages within tumor tissue, we explored the transcriptional heterogeneity of CAFs using pseudo-time analysis. To predict the origins of CAFs, we employed CytoTRACE analysis in combination with Monocle2. Over the pseudo-time progression, there was a gradual decline in the prevalence of the DNAJB1+HSPA1A+ CAFs and CCN2+ATP5F1E+ CAFs subtypes, accompanied by an increase in the proportions of ADM+ALDOA+ CAFs, ATP5E+PRKCDBP+ CAFs and PLA2G2A+APOD+ CAFs subtypes (Figure 3A,3B). Figure 3C illustrates the dynamic expression patterns of five representative genes, including CCN2, DNAJB1, ADM, PLA2G2A, and ATP5E, along pseudo-time. Figure 3D illustrates the number and intensity of cellular communication between the 5 CAFs subgroups and other cell types within PDAC tissues. This visualization highlights the complex network of intercellular interactions. Furthermore, Figure 3E,3F explore the ligand-receptor interactions between the five CAFs subgroups and other cell types within PDAC tissues. Notably, we found that these CAFs primarily interacted with other cell types through the MIF-(CD74 + CXCR4) and MIF-(CD74 + CD44) receptor-ligand pairs.
GSVA was applied to score established classical CAFs phenotype markers (24,25), aiming to uncover the potential phenotypes and functions of our CAFs subtypes (table available at https://cdn.amegroups.cn/static/public/jgo-2026-0297-2.xlsx). DNAJB1+HSPA1A+ CAFs exhibited the highest scores for pan-iCAFs-2, ATP5E+PRKCDBP+ CAFs showed the most prominent scores for pan-myCAFs, pan-dCAFs, and pan-pCAFs. PLA2G2A+APOD+ CAFs had strong scores for pan-iCAFs, pan-myCAFs, and pan-iCAFs-2, and CCN2+ATP5F1E+ CAFs showed prominent scores for pan-dCAFs and pan-pCAFs. In contrast, ADM+ALDOA+ CAFs had the lowest scores among all classical CAFs subtypes (Figure 3G). We further analyzed the expression of CAFs phenotype markers, including pro-inflammatory, neo-angiogenic, contractile, ECM-related, TGF-β, RAS, and MMP genes, in our CAFs subgroups. The pathway heatmap (Figure 3H) revealed that DNAJB1+HSPA1A+ and PLA2G2A+APOD+ CAFs had significantly higher expression of pro-inflammatory pathway genes such as CXCL3 and CCL21, while ATP5E+PRKCDBP+ CAFs exhibited elevated expression of MMPs.
Screening the prognostic CAFs subtypes
First, we applied the ssGSEA algorithm to calculate the enrichment scores for each CAFs subtype based on their respective DEGs. We then assessed their prognostic significance in PDAC patients across multiple cohorts. Using univariate Cox regression analysis, we determined the hazard ratios for each CAFs subtype in TCGA, PACAAU, GSE71729, GSE57495, and GSE28735. The results indicated that ADM+ALDOA+ CAFs are poor prognostic factors (Figure 4A). Additionally, ADM+ALDOA+ CAFs, DNAJB1+HSPA1A+ CAFs, and PLA2G2A+APOD+ CAFs were effective in distinguishing OS among the TCGA-PDAC cohorts (Figure 4B-4F). We then extracted gene lists from these 3 prognostic CAFs subgroups to establish the model. Prior to this, GO and disease ontology enrichment (DO) analysis were performed on these genes. GO analysis showed that CAFs prognostic genes were mainly enriched in response to cellular stress, RNA catabolic process, collagen containing extracellular matrix (Figure 4G). DO analysis further confirmed that CAFs genes were significantly enriched in cancer, including pancreatic cancer (Figure 4H, table available at https://cdn.amegroups.cn/static/public/jgo-2026-0297-3.xls).
Construction and evaluation of CAFAS
We derived the CAFAS signature from the TCGA PDAC cohort (n=145) and validated it in four independent cohorts (GSE28735, GSE57495, GSE71729, PACAAU; n=311), utilizing transcriptomic and survival data from 456 patients in total.
Figure 5A,5B present the PCA plots before and after the removal of batch effects, respectively. The results indicate that the batch effect is reduced after using the combat algorithm. Afterwards, 38 genes were identified from the above prognostic CAFs genes using univariate cox regression analysis (Figure 5C,5D, table available at https://cdn.amegroups.cn/static/public/jgo-2026-0297-4.xls). The LASSO analyses further narrowed down the 38 genes to 16 prognostic genes (Figure 5E,5F). We evaluated seven machine learning algorithms linked to survival (Enet, lasso, Ridge, XGBoost, plsRcox, SuperPC, and CoxBoost) using 16 gene features selected by the LASSO algorithm to identify the hyperparameter-tuned model with optimal accuracy and minimal overfitting risk. We employed nested CV with 10 outer folds for validation and 5 inner folds for hyperparameter tuning on the TCGA-PDAC dataset. As illustrated in Figure 5G,5H, the Ridge survival model outperformed the others, demonstrating the highest mean C-index and lowest mean IBS. After tuning the hyperparameters, we applied the Ridge model to the full TCGA-PDAC dataset, referring to the final model as CAFAS.
To evaluate the prognostic performance of the CAFAS, we employed the model to five PDAC patient cohorts. We divided all patients into high- and low-risk groups according to the median values across distinct cohorts. As shown by the Kaplan-Meier analyses, CAFAS demonstrated robust prognostic value in distinguishing the survival status. Although there was no statistical difference in GSE57495 and GSE71729 due to sample size, the survival trends were consistent with those observed in the training cohort (Figure 5I-5N). After that, we used 1-, 3-, and 5-year AUC to examine the prognostic values of CAFAS in the TCGA-PDAC training cohort and four external validating cohorts. The results of the 1-, 3-, and 5-year AUC analyses were 0.72, 0.7, 0.73 for TCGA-PDAC, and 0.73, 0.71, 0.67 for PACAAU, and 0.68, 0.71, 0.7 for GSE57495, and 0.52, 0.76, 0.65 for GSE28735, and 0.65, 0.6, 0.56 for GSE71729, and 0.67, 0.68, 0.67 for Meta (Figure 5O-5T). CAFAS demonstrated strong predictive power across validating datasets.
Identification of targets and drugs for high CAFAS score patients
To identify potential targets for patients with high CAFAS scores, we conducted a two-step analysis to find promising candidates (26). First, we performed a Spearman’s correlation analysis to examine the relationship between CAFAS scores and the expression levels of druggable genes in the TCGA-PDAC cohort. This analysis revealed a set of genes positively correlated with the CAFAS score, marking them as relevant targets (Figure 6A-6E, r>0.35, P<0.05). Next, we analyzed the correlation between CERES and CAFAS scores in pancreatic CCLs using Spearman’s correlation analysis to identify targets associated with cell growth (Figure 6F-6J, r<−0.35, P<0.05). Notably, four genes—PLCB3, RELA, PSEN1, and PSMA5—were highlighted in both patient-cohort expression correlation analysis and cell-line dependency analysis.
Afterwards, we identified potential therapeutic agents for PDAC patients with high CAFAS scores by analyzing sensitivity data from two key datasets: the CTRP and the PRISM dataset. Using the established protocol (22), we selected six agents from each dataset. The CTRP-derived agents included paclitaxel, afatinib, PD318088, selumetinib, canertinib, dasatinib, while the PRISM-derived agents comprised bosutinib, flumethasone, Ro-4987655, PD-0325901, AZD8330, halobetasol-propionate. The estimated AUC values for these agents showed a statistically significant negative correlation with CAFAS scores and were notably lower in the high CAFAS group (Figure 6K-6N).
In addition, the IC50 values of therapeutic drugs such as AKT inhibitor VIII (Figure 6O), lapatinib (Figure 6P), paclitaxel (Figure 6Q), and sorafenib (Figure 6R) were lower in the high CAFAS score group, suggesting that patients with high CAFAS scores are more likely to benefit from these drugs.
Immune landscapes classified by CAFAS
We performed a thorough analysis of the TME of PDAC using the “IOBR” package. We found that immune cell infiltration levels, specifically T cells and dendritic cells, were significantly higher in patients with low CAFAS than in those with high CAFAS, indicating that PDAC with low CAFAS levels are more likely to be categorized as immune-hot tumors (Figure 7A,7B). In parallel, patients with the low CAFAS score group had higher stromal, immune, and ESTIMATE scores compared to the high CAFAS score group (Figure 7C). The histopathological analysis of the TCGA dataset confirmed that the low CAFAS score group showed increased immune cell infiltration, whereas the high CAFAS score group exhibited reduced immune infiltration (Figure 7D). Furthermore, the tumor-immune cycle dynamics were analyzed between the high-risk and low-risk groups. The low CAFAS score group demonstrated significantly higher activity in most stages of the cycle, including the recruitment of T cells, dendritic cells, macrophages, NK cells, B cells, and infiltration of immune cell into tumors (Figure 7E).
The predictive value of CAFAS for immunotherapy response
Immunomodulators (IMs) are crucial for cancer immunotherapy, and multiple IMs agonists and antagonists are under evaluation in clinical oncology. We further analyzed IMs gene expression and their regulation through epigenetic and miRNA mechanisms. The expression of a series of IMs differed across the CAFAS score groups (Figure 7F), suggesting the heterogeneity of the immune microenvironment in the high-risk and low-risk groups, which also means that there is heterogeneity in immunotherapy response with different CAFAS scores, and patients with different CAFAS scores will flow into different modes of anti-cancer therapies. As mentioned above, the phenomenon of more immune cell infiltration and active tumor immune cycle in the low CAFAS score group may indicate that its microenvironment is relatively immune-supportive and more responsive to immunotherapy. Thus, we next explored the performance of CAFAS in predicting the efficacy of immunotherapy in patients with PDAC. We evaluated the expression of immune checkpoints (ICs) between different CAFAS score groups. Notably, nearly all ICs, particularly PD-1, PD-L1, and CTLA-4, showed higher expression levels in the low CAFAS score group (Figure 7G). Likewise, previously reported signatures linked to favorable immunotherapy biomarkers were significantly enriched in the low CAFAS score group (Figure 7H,7I). Furthermore, the TIDE algorithm evaluated the patient response to immune checkpoint inhibitors (ICIs), demonstrating that the low CAFAS group had greater response (Figure 7J). The subclass mapping algorithm also showed that low CAFAS group indicated an improved response to PD-1 therapy (Figure 7K). Overall, these findings suggest that PDAC patients with low CAFAS scores may be more likely to benefit from immunotherapy, potentially owing to a more immune-inflamed TME.
Association of CAFAS with biological processes
We further explored the biological functions associated with CAFAS by examining its impact on hallmark pathways. As shown in Figure 8A, the CAFAS-high group was positively associated with pathways related to DNA repair, mitotic spindle, G2M checkpoint, glycolysis, indicating enhanced proliferative and metabolic activity. In comparison, the CAFAS-low group was significantly associated with adipogenesis, myogenesis, complement activation, and fatty acid metabolism. Moreover, we investigated the impact of CAFAS on biological pathways using GSVA. Consistent with the hallmark pathway results, the CAFAS-high group significantly exhibited enrichment in cell-cycle related and DNA replication-related pathways, whereas the CAFAS-low group showed enrichment in T cell receptor signaling pathway and fatty acid metabolism (Figure 8B). These findings collectively indicate that high CAFAS expression is linked to increased proliferative potential and metabolic reprogramming, while low CAFAS expression is associated with enhanced immune activity and lipid metabolism.
Discussion
Tumor heterogeneity remains a central challenge in oncology research. Even within the same cancer type, substantial heterogeneity in tumor biology and TME lead to significant differences in patient prognosis and therapeutic response. This intrinsic diversity complicates the accurate prediction of clinical outcomes and treatment efficacy. Consequently, extensive efforts have been devoted to developing predictive models that can capture this heterogeneity. In pancreatic cancer, the extracellular matrix is predominantly composed of CAFs, which exert crucial roles in tumor growth, invasion, metastasis, and therapeutic resistance through multiple signaling pathways (27). Studies have demonstrated that elevated levels of CAFs markers correlate with unfavorable clinical outcomes in various tumors (28-31). Identifying molecular signatures related to CAFs may therefore enable more precise and individualized prognostic assessments, facilitating the prediction of patient survival, recurrence risk, and treatment responsiveness.
Despite extensive evidence supporting the tumor-promoting phenotypes of CAFs, therapeutic strategies targeting CAFs have generally produced unsatisfactory outcomes in mouse models and clinical trials (4,32,33). From the perspective of TME heterogeneity, distinct CAFs subpopulations coexist within the pancreatic cancer microenvironment, exerting diverse and sometimes contradictory biological functions. For example, studies have found that a high proportion of the ENO1+ADM+ CAFs subset is associated with poor prognosis in pancreatic cancer (34). In another study, investigators identified two functionally distinct CAFs populations based on CD105 expression, where co-transplantation experiments revealed that CD105+ CAFs promoted tumor growth, in contrast to the CD105− CAFs (11). Moreover, FAP+ CAFs have been reported to exhibit tumor-restrictive phenotypes (35). In our study, we identified five CAFs subsets, among which the ADM+ALDOA+ CAFs was associated with adverse clinical outcomes. Notably, ENO1 was also expressed as a defining marker within this subgroup, consistent with other studies. An important observation in our study is that ADM+ALDOA+ CAFs exhibited the lowest scores across established CAFs subtype signatures, including myCAFs, iCAFs, and apCAFs. This suggests that ADM+ALDOA+ CAFs may represent a non-classical CAFs state that is not adequately captured by canonical CAFs classification framework. Classical CAFs categories are mainly defined by inflammatory or myofibroblastic features. In contrast, ADM+ALDOA+ CAFs were characterized by glycolytic and ADP metabolic processes. Thus, this subgroup may reflect a metabolically reprogrammed fibroblast phenotype with distinct tumor-promoting properties.
Our study also revealed the functional heterogeneity of CAFs in the TME of pancreatic cancer. DNAJB1+HSPA1A+ CAFs, and PLA2G2A+APOD+ CAFs was associated with better prognosis, suggesting the tumor-suppressive effect of these two CAFs. Recent single-cell and spatial transcriptomic analyses have demonstrated that neoadjuvant therapy (NAT) can alter CAFs–malignant cell interactions, enhance IL-6 family signaling, and activate complement-related programs (36,37). Particularly, post-NAT tumors have been reported to exhibit reduced myCAFs-like abundance and increased iCAFs features (38). These findings suggest that CAFs phenotypes in PDAC are dynamic and therapy-responsive. We hypothesize that NAT may further reshape the relative distribution and functional states of these CAFs subtypes in our classification. PLA2G2A+APOD+ CAFs and DNAJB1+HSPA1A+ CAFs, which are associated with inflammatory features (iCAFs), may correspond more closely to NAT-induced iCAFs-like states, a shift toward inflammatory or immune-supportive CAFs states may reflect more favorable remodeling. Such NAT-induced shifts in CAFs subtype composition could affect residual tumor biology, postoperative prognosis, immunotherapy responsiveness, and the rational selection of adjuvant or stromal-targeted treatments. Future studies using matched pre- and post-NAT single-cell or spatial transcriptomic cohorts are required to determine whether CAFAS can monitor treatment-induced stromal remodeling and guide post-NAT therapeutic strategies.
Subsequently, we benchmarked seven survival machine learning algorithms using nested CV with CAFs genes associated with prognosis. In recent years, several studies have adopted large-scale benchmarking strategies, such as evaluating up to 101 machine learning models, to identify the optimal algorithm for survival prediction (39-41). Although such approaches provide a comprehensive comparison across diverse algorithms, they often suffer from excessive computational complexity, algorithmic redundancy, and limited interpretability. Moreover, when hundreds of models are simultaneously tuned and compared, the risk of overfitting and selection bias increases, and the resulting models may exhibit poor generalizability to independent datasets. In contrast, our study employed a more rigorous yet efficient framework by integrating seven machine learning algorithms (Enet, LASSO, Ridge, XGBoost, plsRcox, SuperPC, and CoxBoost) within a nested CV design (5-fold inner and 10-fold outer folds). This approach offers several methodological advantages (14). First, the nested structure strictly separates model training, hyperparameter optimization, and performance evaluation, thereby minimizing information leakage and providing an unbiased estimate of model performance. Second, the selected algorithms encompass a wide spectrum of modeling paradigms—ranging from linear and regularized regression to nonlinear ensemble and dimension-reduction methods—ensuring methodological diversity while avoiding unnecessary redundancy. Third, by balancing computational efficiency with methodological rigor, this design yields stable, reproducible, and biologically interpretable results. Collectively, this strategy enhances the robustness and clinical applicability of the prognostic model compared with large-scale, exploratory modeling frameworks. Our developed CAFAS demonstrated superior predictive value in both the training and validation cohorts of pancreatic cancer.
Pancreatic cancer remains one of the few malignancies that have derived limited benefit from current immunotherapeutic strategies. A fundamental reason for this resistance lies in its profoundly immunosuppressive TME, where CAFs facilitate immune evasion by secreting cytokines, remodeling the extracellular matrix, and regulating immune cell infiltration and activation (42). The regulatory role of CAFs in shaping tumor immunity and mediating resistance to immunotherapy is therefore of critical importance. In this context, the CAFAS developed in our study effectively discriminated and predicted immune cell infiltration, immune cycle activity, expression of immunotherapy-related molecules, and the potential efficacy of immunotherapy across different patient subgroups. These findings further reinforce the central role of CAFs in orchestrating the immune microenvironment of pancreatic cancer. Notably, the extracellular matrix protein TGFBI, which is highly expressed in CAFs, has been shown to suppress tumor-specific CD8+ T-cell proliferation and activation through its interaction with CD61 on the T-cell surface (43), and to promote M2 macrophage polarization within the TME (44). In our study, TGFBI, one of the genes incorporated into CAFAS, was predominantly expressed in ADM+ALDOA+ CAFs (Figure 8C,8D). Furthermore, ligand-receptor interactions analysis revealed that ADM+ALDOA+ CAFs interact with T cells via the MIF–CD74 signaling axis (Figure 3E), providing additional evidence for the rationale and effectiveness of CAFAS in capturing the immune landscape of pancreatic cancer. These findings also provide a heterogeneous research perspective and potential therapeutic targets for studying CAFs in the immunosuppressive microenvironment. Similarly, PTGES, which is highly expressed in pancreatic cancer and associated with poor clinical outcomes, has been implicated in metabolic and immune reprogramming within the TME (45). We observed that PTGES was highly expressed in ADM+ALDOA+ CAFs (Figure 8E,8F), a subset associated with glycolytic activity (Figure 2J), suggesting that PTGES may exert its tumor-promoting effects in pancreatic cancer through this specific CAFs population.
However, there are some limitations in this study that need to be addressed. First, the ductal cell annotated in the single-cell atlas may include both malignant PDAC cells and non-malignant ductal cells. Although the cell-cell communication analysis was conducted using tumor tissue-derived cells, inferred CNV analysis or mutation-based classification was not performed to definitively distinguish malignant ductal cells from benign ductal cells. Therefore, interactions involving ductal cells should be interpreted as tumor-sample-derived ductal cell communication. Second, although CERES-based dependency analysis provided preliminary clues regarding potential therapeutic vulnerabilities associated with high CAFAS scores, these data were derived from pancreatic CCLs. The correlations between CAFAS and CERES scores should be interpreted cautiously as exploratory associations reflecting tumor-cell-intrinsic dependencies rather than direct evidence of CAFs-mediated therapeutic vulnerability within the TME. Further validation using CAFs-tumor cell co-culture systems, patient-derived organoids, spatial transcriptomics, and in vivo models is required to clarify the functional roles of these candidate targets. Third, additional experimental studies are needed to elucidate the mechanistic roles of CAFs subpopulations in pancreatic cancer immune escape and therapeutic resistance. These limitations underscore the necessity for future research to deepen our understanding of the role of CAFs in cancer biology.
Conclusions
Our study established a robust CAFAS with strong prognostic and immunological relevance in pancreatic cancer. CAFAS demonstrated accurate patient stratification and reliable predictive performance across prognosis evaluation, drug sensitivity screening, and immune status assessment. Moreover, CAFAS provides a valuable framework for elucidating CAFs heterogeneity and intercellular communication within the TME. The identification of key CAFs-associated genes, such as TGFBI and PTGES, further highlights potential molecular targets for modulating the immunosuppressive stroma and enhancing therapeutic efficacy. Collectively, our findings not only deepen the understanding of the biological complexity of CAFs but also offer novel insights into precision prognosis prediction and immunotherapy optimization for pancreatic cancer.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2026-0297/rc
Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2026-0297/prf
Funding: This work was jointly supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2026-0297/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Srinivasan D, Gout J, Kleger A, et al. Tumor microenvironment subtyping in pancreatic ductal adenocarcinoma: New avenues for personalized therapeutic strategies. Adv Drug Deliv Rev 2025;226:115697. [Crossref] [PubMed]
- Sun Q, Zhang B, Hu Q, et al. The impact of cancer-associated fibroblasts on major hallmarks of pancreatic cancer. Theranostics 2018;8:5072-87. [Crossref] [PubMed]
- Zhang Q, Cao Z, Yan S, et al. Metabolic and immune crosstalk between cancer-associated fibroblasts and pancreatic cancer cells. J Transl Med 2025;23:1118. [Crossref] [PubMed]
- Özdemir BC, Pentcheva-Hoang T, Carstens JL, et al. Depletion of carcinoma-associated fibroblasts and fibrosis induces immunosuppression and accelerates pancreas cancer with reduced survival. Cancer Cell 2014;25:719-34. [Crossref] [PubMed]
- Kumar V, Donthireddy L, Marvel D, et al. Cancer-Associated Fibroblasts Neutralize the Anti-tumor Effect of CSF1 Receptor Blockade by Inducing PMN-MDSC Infiltration of Tumors. Cancer Cell 2017;32:654-668.e5. [Crossref] [PubMed]
- Fearon DT. The carcinoma-associated fibroblast expressing fibroblast activation protein and escape from immune surveillance. Cancer Immunol Res 2014;2:187-93. [Crossref] [PubMed]
- Elyada E, Bolisetty M, Laise P, et al. Cross-Species Single-Cell Analysis of Pancreatic Ductal Adenocarcinoma Reveals Antigen-Presenting Cancer-Associated Fibroblasts. Cancer Discov 2019;9:1102-23. [Crossref] [PubMed]
- Öhlund D, Handly-Santana A, Biffi G, et al. Distinct populations of inflammatory fibroblasts and myofibroblasts in pancreatic cancer. J Exp Med 2017;214:579-96. [Crossref] [PubMed]
- Krishnamurty AT, Shyer JA, Thai M, et al. LRRC15(+) myofibroblasts dictate the stromal setpoint to suppress tumour immunity. Nature 2022;611:148-54. [Crossref] [PubMed]
- Huang H, Wang Z, Zhang Y, et al. Mesothelial cell-derived antigen-presenting cancer-associated fibroblasts induce expansion of regulatory T cells in pancreatic cancer. Cancer Cell 2022;40:656-673.e7. [Crossref] [PubMed]
- Hutton C, Heider F, Blanco-Gomez A, et al. Single-cell analysis defines a pancreatic fibroblast lineage that supports anti-tumor immunity. Cancer Cell 2021;39:1227-1244.e20. [Crossref] [PubMed]
- Butler A, Hoffman P, Smibert P, et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 2018;36:411-20. [Crossref] [PubMed]
- Korsunsky I, Millard N, Fan J, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 2019;16:1289-96. [Crossref] [PubMed]
- Huang RH, Hong YK, Du H, et al. A machine learning framework develops a DNA replication stress model for predicting clinical outcomes and therapeutic vulnerability in primary prostate cancer. J Transl Med 2023;21:20. [Crossref] [PubMed]
- Qiu X, Hill A, Packer J, et al. Single-cell mRNA quantification and differential analysis with Census. Nat Methods 2017;14:309-15. [Crossref] [PubMed]
- Cheng S, Li Z, Gao R, et al. A pan-cancer single-cell transcriptional atlas of tumor infiltrating myeloid cells. Cell 2021;184:792-809.e23. [Crossref] [PubMed]
- Jin S, Guerrero-Juarez CF, Zhang L, et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun 2021;12:1088. [Crossref] [PubMed]
- Aibar S, González-Blas CB, Moerman T, et al. SCENIC: single-cell regulatory network inference and clustering. Nat Methods 2017;14:1083-6. [Crossref] [PubMed]
- Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 2013;14:7. [Crossref] [PubMed]
- Galbo PM Jr, Zang X, Zheng D. Molecular Features of Cancer-associated Fibroblast Subtypes and their Implication on Cancer Pathogenesis, Prognosis, and Immunotherapy Resistance. Clin Cancer Res 2021;27:2636-47. [Crossref] [PubMed]
- Ghandi M, Huang FW, Jané-Valbuena J, et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 2019;569:503-8. [Crossref] [PubMed]
- Yang C, Huang X, Li Y, et al. Prognosis and personalized treatment prediction in TP53-mutant hepatocellular carcinoma: an in silico strategy towards precision oncology. Brief Bioinform 2021;22:bbaa164. [Crossref] [PubMed]
- Zeng D, Ye Z, Shen R, et al. IOBR: Multi-Omics Immuno-Oncology Biological Research to Decode Tumor Microenvironment and Signatures. Front Immunol 2021;12:687975. [Crossref] [PubMed]
- Sun X, Meng F, Nong M, et al. Single-cell dissection reveals the role of aggrephagy patterns in tumor microenvironment components aiding predicting prognosis and immunotherapy on lung adenocarcinoma. Aging (Albany NY) 2023;15:14333-71. [Crossref] [PubMed]
- Affo S, Nair A, Brundu F, et al. Promotion of cholangiocarcinoma growth by diverse cancer-associated fibroblast subpopulations. Cancer Cell 2021;39:866-882.e11. [Crossref] [PubMed]
- Sun X, Nong M, Meng F, et al. Architecting the metabolic reprogramming survival risk framework in LUAD through single-cell landscape analysis: three-stage ensemble learning with genetic algorithm optimization. J Transl Med 2024;22:353. [Crossref] [PubMed]
- Zhang T, Ren Y, Yang P, et al. Cancer-associated fibroblasts in pancreatic ductal adenocarcinoma. Cell Death Dis 2022;13:897. [Crossref] [PubMed]
- Min KW, Kim DH, Noh YK, et al. Cancer-associated fibroblasts are associated with poor prognosis in solid type of lung adenocarcinoma in a machine learning analysis. Sci Rep 2021;11:16779. [Crossref] [PubMed]
- Ha SY, Yeo SY, Xuan YH, et al. The prognostic significance of cancer-associated fibroblasts in esophageal squamous cell carcinoma. PLoS One 2014;9:e99955. [Crossref] [PubMed]
- Umakoshi M, Kudo-Asabe Y, Tsuchie H, et al. Prognostic Value of Cancer-Associated Fibroblast Marker Expression in the Intratumoral and Marginal Areas of Soft Tissue Sarcoma. Pathobiology 2025;92:1-17. [Crossref] [PubMed]
- Takagi K, Noma K, Nagai Y, et al. Impact of cancer-associated fibroblasts on survival of patients with ampullary carcinoma. Front Oncol 2023;13:1072106. [Crossref] [PubMed]
- Amakye D, Jagani Z, Dorsch M. Unraveling the therapeutic potential of the Hedgehog pathway in cancer. Nat Med 2013;19:1410-22. [Crossref] [PubMed]
- Rhim AD, Oberstein PE, Thomas DH, et al. Stromal elements act to restrain, rather than support, pancreatic ductal adenocarcinoma. Cancer Cell 2014;25:735-47. [Crossref] [PubMed]
- Li L, Shen L, Wu H, et al. An integrated analysis identifies six molecular subtypes of pancreatic ductal adenocarcinoma revealing cellular and molecular landscape. Carcinogenesis 2023;44:726-40. [Crossref] [PubMed]
- Cumming J, Maneshi P, Dongre M, et al. Dissecting FAP+ Cell Diversity in Pancreatic Cancer Uncovers an Interferon-Response Subtype of Cancer-Associated Fibroblasts with Tumor-Restraining Properties. Cancer Res 2025;85:2388-411. [Crossref] [PubMed]
- Shiau C, Cao J, Gong D, et al. Spatially resolved analysis of pancreatic cancer identifies therapy-associated remodeling of the tumor microenvironment. Nat Genet 2024;56:2466-78. [Crossref] [PubMed]
- Zhang X, Lan R, Liu Y, et al. Complement activation in tumor microenvironment after neoadjuvant therapy and its impact on pancreatic cancer outcomes. NPJ Precis Oncol 2025;9:58. [Crossref] [PubMed]
- Zhang X, Lan R, Li D, et al. Neoadjuvant Therapy-Induced Remodeling in Pancreatic Ductal Adenocarcinoma: Multimodal Spatial Analysis and Prognosis. Cancer Sci 2026;117:1167-77. [Crossref] [PubMed]
- Wang R, Qin GH, Jiang Y, et al. Integrated multiomics analysis identifies PHLDA1+ fibroblasts as prognostic biomarkers and mediators of biological functions in pancreatic cancer. Front Immunol 2025;16:1592416. [Crossref] [PubMed]
- Ye W, Yang H, Yi X, et al. TCIRG1 as a Novel Prognostic Biomarker Triggering Immune Infiltration in Renal Clear Cell Carcinoma: An Integrative Study of Single-Cell and Bulk Data. Hum Mutat 2025;2025:1839494. [Crossref] [PubMed]
- Yan L, Yu H, Xu X, et al. Integrated machine learning-based establishment of a prognostic model in multicenter cohorts for acute myeloid leukemia. Front Oncol 2025;15:1649594. [Crossref] [PubMed]
- Shakiba M, Tuveson DA. Macrophages and fibroblasts as regulators of the immune response in pancreatic cancer. Nat Immunol 2025;26:678-91. [Crossref] [PubMed]
- Goehrig D, Nigri J, Samain R, et al. Stromal protein βig-h3 reprogrammes tumour microenvironment in pancreatic cancer. Gut 2019;68:693-707. [Crossref] [PubMed]
- Zhou J, Lyu N, Wang Q, et al. A novel role of TGFBI in macrophage polarization and macrophage-induced pancreatic cancer growth and therapeutic resistance. Cancer Lett 2023;578:216457. [Crossref] [PubMed]
- Murthy D, Attri KS. PTGES Expression Is Associated with Metabolic and Immune Reprogramming in Pancreatic Ductal Adenocarcinoma. Int J Mol Sci 2023;24:7304. [Crossref] [PubMed]

