Prognostic nomogram for colorectal cancer liver metastasis treated with tumor resection and chemotherapy based on SEER database
Highlight box
Key findings
• A prognostic nomogram integrating race, age, tumor characteristics (site/grade), carcinoembryonic antigen (CEA) level, nodal status, and liver surgery was developed for colorectal cancer liver metastasis (CRCLM) patients treated with resection and chemotherapy, demonstrating superior discriminative accuracy (C-index: 0.657) compared to existing models [Fong clinical risk score, Basingstoke index, tumor-node-metastasis (TNM) staging; all P<0.001].
• The nomogram stratified patients into low- [<220], intermediate- [220–301], and high-risk [>301] groups, with robust performance [area under the curves (AUCs): 0.717–0.737 in validation] and clinical utility across risk thresholds [decision curve analysis (DCA)-confirmed].
• A web-based calculator (https://lxt134520.shinyapps.io/output/) was developed to facilitate individualized OS prediction at 1, 3, and 5 years.
What is known and what is new?
• Existing prognostic tools (e.g., Fong clinical risk score, Basingstoke index) rely on postoperative data or lack modern treatment factors (e.g., neoadjuvant therapy response), limiting preoperative utility.
• This study integrates clinicopathologic variables (e.g., tumor deposits, CEA) and treatment factors (liver surgery) into a Surveillance, Epidemiology, and End Results (SEER)-derived nomogram, addressing gaps in preoperative decision-making. Notably, it highlights socioeconomic confounders (race, marital status) as prognostic variables.
What is the implication, and what should change now?
• Clinicians should use this nomogram to identify high-risk CRCLM patients who may benefit from intensified surveillance or adjuvant therapy, while low-risk patients could avoid overtreatment.
• Limitations (lack of chemotherapy regimens/molecular markers, external validation) underscore the need for future studies to incorporate treatment-specific variables [e.g., bevacizumab use, rat sarcoma viral oncogene homolog (RAS) status] and validate the model in diverse cohorts.
• The web calculator enables real-time risk assessment, but its integration into clinical workflows requires validation in prospective settings.
Introduction
Colorectal cancer (CRC) ranks as the third most prevalent malignancy globally, accounting for approximately 10% of cancer diagnoses and 9.4% of cancer-related deaths. In China, CRC burden is particularly severe, representing 49.3% of new global cases and 58.3% of CRC-related deaths in 2020 (1). While the 5-year survival rate for localized CRC approaches 57%, metastatic CRC remains dismal at 11% (2). The liver serves as the primary site of hematogenous metastasis, with 15–25% of CRC patients presenting synchronous liver metastases and 18–25% developing metachronous metastases within 5 years of primary resection (3). Untreated liver metastases confer a median survival of 6.9 months, whereas complete resection extends survival to 35 months with 5-year survival rates of 30–57% (4).
Current management of colorectal cancer liver metastasis (CRCLM) emphasizes multidisciplinary approaches, including primary tumor resection and systemic chemotherapy. While surgery remains the only curative option, only 20% of patients achieve long-term remission, with recurrence rates exceeding 60% (5). Conversion therapy has expanded resectability for initially unresectable CRCLM, improving 5-year survival through tumor downsizing (6). However, prognostic heterogeneity persists among patients receiving similar treatments, underscoring the need for personalized risk stratification.
The American Joint Committee on Cancer (AJCC) tumor-node-metastasis (TNM) staging system, while foundational, inadequately captures prognostic complexity in CRCLM. It omits critical factors such as age, socioeconomic determinants, and treatment-specific variables (7). The Surveillance, Epidemiology, and End Results (SEER) database, developed by the U.S. National Cancer Institute (NCI), encompasses 18 population-based cancer registries covering nearly 30% of the U.S. population. It collects comprehensive data on thyroid, breast, gastric, colorectal, hepatic, pancreatic, lung, and bladder cancers. Beyond basic patient information, the database provides critical details on cancer classification, grade, TNM staging, lymph node metastasis, treatment modalities, chemotherapy/radiotherapy status, survival outcomes, and follow-up duration (8). Nomograms address this gap by integrating diverse predictors into visual prognostic tools, outperforming TNM staging in multiple cancers (9). This study analyzed CRCLM cases from the SEER database (2010–2015) treated with primary tumor resection and chemotherapy. We aimed to identify independent prognostic factors and develop a visual nomogram integrating demographic, pathological, and treatment variables to validate its predictive utility. We present this article in accordance with the TRIPOD reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-415/rc).
Methods
Patient selection and data source
Data were obtained from the SEER 8-registries database (SEER*Stat 8.4.3), covering patients diagnosed with CRCLM between 2010 and 2015. The study setting was based on the general population in the United States, utilizing data from SEER, which covers approximately 28% of the U.S. population across multiple geographic regions and centers (10). Inclusion criteria were:
- Pathologically confirmed adenocarcinoma (ICD-O-3 code: 8140) originating in the colon or rectum.
- Synchronous or metachronous liver metastases.
- Treatment with primary tumor resection combined with chemotherapy.
- Complete survival data (follow-up duration ≤143 months).
Exclusion criteria included:
- Non-colorectal primary tumors, non-adenocarcinoma histology, or unknown tumor grade.
- Metastases to other organs (e.g., bone, brain, lung).
- Missing survival or treatment details.
In terms of missing data, all analyses were conducted using complete-case data. Patients with missing values in essential variables, such as survival time, treatment details, or tumor characteristics, were excluded during the initial data screening process. Therefore, no imputation methods were applied. A total of 3,252 patients met eligibility criteria and were randomized into training cohorts (70%, n=2,276) (11) and validation cohorts (30%, n=976) (12) (Figure 1). The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments and did not require ethical committee approval.
Variables and data processing
Eighteen variables were initially evaluated, including:
- Demographics: age, sex, race, marital status, median household income.
- Tumor characteristics: primary site (right colon, left colon, rectum), differentiation grade, AJCC T/N stage, carcinoembryonic antigen (CEA) level, tumor deposits, number of regional lymph nodes examined, number of positive lymph nodes, tumor size, number of primary tumors, and perineural invasion.
- Treatment factors: radiotherapy records and liver metastasis surgery.
TNM staging followed the 7th edition guidelines from the Union for International Cancer Control (UICC). Overall survival (OS) was defined as the time from diagnosis to death. Blinding of outcome assessment was not applicable, as OS data were obtained objectively from the SEER database, minimizing the risk of assessment bias.
Continuous variables (age, tumor size, nodes examined/positive, tumor deposits, household income) were categorized using X-tile software (v3.6.1, Yale University) (13) to determine optimal cutoffs via Kaplan-Meier analysis (Figure 2). Race was included as a demographic variable but interpreted cautiously, recognizing potential socioeconomic confounders (e.g., healthcare access) rather than biological causality. Blinding of predictor assessment was not applicable, as all predictors were extracted from the SEER database, which contains structured and standardized data recorded prior to outcome occurrence.
Statistical analysis
Univariate Cox regression identified variables with P<0.05 for inclusion in multivariate Cox analysis. Independent prognostic factors (P<0.05) were used to construct a nomogram using the “survival”, “boot” and “rms” package in R (v4.3.2) (14). Model performance was assessed via:
- Discrimination: time-dependent receiver operating characteristic (ROC) curves with area under the curve (AUC) analysis.
- Calibration: Bootstrap-corrected calibration plots (using 1,000 resamples).
- C-index: the C-index ranges from 0.5 to 1.0, with higher values indicating better discriminative ability.
- Clinical utility: decision curve analysis (DCA) was performed to quantify net benefit across risk thresholds (15).
During model development, we internally validated the Cox regression model using 1,000 bootstrap resampling iterations to assess potential overfitting and optimism correction. The validation cohort shared identical eligibility criteria and predictor definitions with the training set. Model performance remained consistent between cohorts; therefore, we deemed recalibration unnecessary as it would not meaningfully improve clinical utility.
Risk stratification thresholds (low: <220; intermediate: 220–301; high: >301) were determined using X-tile. A web-based calculator (https://lxt134520.shinyapps.io/output/) was developed for clinical application. Data analysis was performed using R 4.3.2 and SPSS 26.0, with statistical significance defined as P<0.05.
Results
Patient characteristics
A total of 3,252 CRCLM patients treated with primary tumor resection and chemotherapy were included. The cohort comprised 56.8% males and 43.2% females, with a median age of 60 years (range: 17–90 years). Most patients were White (75.7%), married (58.7%), and had left-sided primary tumors (44.9%). Synchronous liver metastases were present in 79.0% of cases. Key clinicopathological features, including tumor grade, CEA levels, and nodal status, were summarized in Table 1. Patients were randomly divided into training (n=2,276) and validation (n=976) cohorts, with balanced baseline characteristics (P>0.05 for all variables).
Table 1
| Variables | Total cohort (n=3,252), n (%) | Training cohort (n=2,276), n (%) | Validation cohort (n=976), n (%) | Chi-squared value | P value |
|---|---|---|---|---|---|
| Race | 1.370 | 0.50 | |||
| White | 2,461 (75.677) | 1,730 (76.011) | 731 (74.898) | ||
| Black | 468 (14.391) | 317 (13.928) | 151 (15.471) | ||
| Other | 323 (9.932) | 229 (10.062) | 94 (9.631) | ||
| Sex | 0.320 | 0.57 | |||
| Female | 1,405 (43.204) | 976 (42.882) | 429 (43.955) | ||
| Male | 1,847 (56.796) | 1,300 (57.118) | 547 (56.045) | ||
| Age (years) | 2.416 | 0.30 | |||
| 17–54 | 1,058 (32.534) | 748 (32.865) | 310 (31.762) | ||
| 55–66 | 1,152 (35.424) | 787 (34.578) | 365 (37.398) | ||
| ≥67 | 1,042 (32.042) | 741 (32.557) | 301 (30.840) | ||
| Marital status | 0.328 | 0.57 | |||
| Married | 1,908 (58.672) | 1,328 (58.348) | 580 (59.426) | ||
| Other | 1,344 (41.328) | 948 (41.652) | 396 (40.574) | ||
| Median household income | 1.670 | 0.43 | |||
| ≤$64,999 | 1,107 (34.041) | 771 (33.875) | 336 (34.426) | ||
| $65,000–$79,999 | 1,115 (34.287) | 769 (33.787) | 346 (35.451) | ||
| ≥$80,000 | 1,030 (31.673) | 736 (32.337) | 294 (30.123) | ||
| Site | 0.006 | >0.99 | |||
| Right colon | 1,295 (39.822) | 906 (39.807) | 389 (39.857) | ||
| Left colon | 1,459 (44.865) | 1,022 (44.903) | 437 (44.775) | ||
| Rectum | 498 (15.314) | 348 (15.290) | 150 (15.369) | ||
| Grade | 7.073 | 0.07 | |||
| I | 120 (3.690) | 92 (4.042) | 28 (2.869) | ||
| II | 2,385 (73.339) | 1,644 (72.232) | 741 (75.922) | ||
| III | 619 (19.034) | 453 (19.903) | 166 (17.008) | ||
| IV | 128 (3.936) | 87 (3.822) | 41 (4.201) | ||
| AJCC T stage | 3.510 | 0.48 | |||
| T1 | 45 (1.384) | 30 (1.318) | 15 (1.537) | ||
| T2 | 118 (3.629) | 80 (3.515) | 38 (3.893) | ||
| T3 | 2,083 (64.053) | 1,450 (63.708) | 633 (64.857) | ||
| T4 | 974 (29.951) | 697 (30.624) | 277 (28.381) | ||
| Other | 32 (0.984) | 19 (0.835) | 13 (1.332) | ||
| AJCC N stage | 2.382 | 0.50† | |||
| N0 | 622 (19.127) | 426 (18.717) | 196 (20.082) | ||
| N1 | 1,305 (40.129) | 932 (40.949) | 373 (38.217) | ||
| N2 | 1,317 (40.498) | 912 (40.070) | 405 (41.496) | ||
| Other | 8 (0.246) | 6 (0.264) | 2 (0.205) | ||
| CEA level | 1.769 | 0.41 | |||
| Positive | 1,909 (58.702) | 1,319 (57.953) | 590 (60.451) | ||
| Negative | 512 (15.744) | 364 (15.993) | 148 (15.164) | ||
| Other | 831 (25.554) | 593 (26.054) | 238 (24.385) | ||
| Perineural invasion | 0.184 | 0.91 | |||
| Yes | 836 (25.707) | 581 (25.527) | 255 (26.127) | ||
| No | 2,180 (67.036) | 1,531 (67.267) | 649 (66.496) | ||
| Other | 236 (7.257) | 164 (7.206) | 72 (7.377) | ||
| Tumor deposits | 2.163 | 0.14 | |||
| 0 | 2,601 (79.982) | 1,805 (79.306) | 796 (81.557) | ||
| ≥1 | 651 (20.018) | 471 (20.694) | 180 (18.443) | ||
| Tumor size (mm) | 2.262 | 0.32 | |||
| >49 | 1,651 (50.769) | 1,175 (51.626) | 476 (48.770) | ||
| ≤49 | 1,457 (44.803) | 1,001 (43.981) | 456 (46.721) | ||
| Other | 144 (4.428) | 100 (4.394) | 44 (4.508) | ||
| Regional nodes examined | 3.346 | 0.07 | |||
| >16 | 1,649 (50.707) | 1,178 (51.757) | 471 (48.258) | ||
| ≤16 | 1,603 (49.293) | 1,098 (48.243) | 505 (51.742) | ||
| Regional nodes positive | 0.225 | 0.63 | |||
| >2 | 1,702 (52.337) | 1,185 (52.065) | 517 (52.971) | ||
| ≤2 | 1,550 (47.663) | 1,091 (47.935) | 459 (47.029) | ||
| Liver metastases surgical | 0.083 | 0.77 | |||
| No | 2,337 (71.863) | 1,639 (72.012) | 698 (71.516) | ||
| Yes | 915 (28.137) | 637 (27.988) | 278 (28.484) | ||
| Radiation recode | 1.420 | 0.49† | |||
| No | 2,882 (88.622) | 2,027 (89.060) | 855 (87.602) | ||
| Yes | 361 (11.101) | 243 (10.677) | 118 (12.090) | ||
| Other | 9 (0.277) | 6 (0.264) | 3 (0.307) | ||
| No. of primary tumors | 0.811 | 0.37 | |||
| 1 | 2,662 (81.857) | 1,854 (81.459) | 808 (82.787) | ||
| ≥2 | 590 (18.143) | 422 (18.541) | 168 (17.213) |
†, there is one cell with an expected value less than 5. Use the likelihood ratio Chi-squared value and the P value. AJCC, American Joint Committee on Cancer; CEA, carcinoembryonic antigen; N, node; T, tumor.
Independent prognostic factors
Univariate Cox regression identified 12 variables associated with OS (P<0.05). Multivariate analysis retained 10 independent predictors (Table 2):
- Demographic: older age [hazard ratio (HR) =1.35 for ≥67 vs. 17–54 years; P<0.001], non-White race (HR =1.21 for Black vs. White; P=0.005), and unmarried status (HR =1.24; P<0.001).
- Tumor-related: right colon primary (HR =1.34 vs. left colon; P<0.001), poor differentiation (HR =1.82 for grade IV vs. I; P<0.001), CEA positivity (HR =1.42; P<0.001), tumor deposits (HR =1.33; P<0.001), and nodal burden (HR =1.34 for ≤16 nodes examined; P<0.001 and HR =1.272 for >2 regional nodes positive; P=0.002).
- Treatment-related: liver metastasis resection (HR =0.63; P<0.001).
Table 2
| Variables | Univariate Cox analysis | Multivariate Cox analysis | |||
|---|---|---|---|---|---|
| HR (95% CI) | P value | HR (95% CI) | P value | ||
| Race | |||||
| Black | Reference | Reference | |||
| White | 0.790 (0.695–0.898) | <0.001 | 0.829 (0.727–0.945) | 0.005 | |
| Other | 0.747 (0.619–0.901) | 0.002 | 0.882 (0.729–1.068) | 0.20 | |
| Sex | |||||
| Female | Reference | – | – | ||
| Male | 1.045 (0.953–1.145) | 0.35 | – | – | |
| Age (years) | |||||
| ≥67 | Reference | Reference | |||
| 17–54 | 0.669 (0.598–0.748) | <0.001 | 0.741 (0.660–0.833) | <0.001 | |
| 55–66 | 0.765 (0.686–0.853) | <0.001 | 0.761 (0.681–0.850) | <0.001 | |
| Marital status | |||||
| Married | Reference | Reference | |||
| Other | 1.232 (1.125–1.350) | <0.001 | 1.243 (1.133–1.363) | <0.001 | |
| Median household income | |||||
| $65,000–$79,999 | Reference | – | – | ||
| ≤$64,999 | 1.113 (0.998–1.240) | 0.054 | – | – | |
| ≥$80,000 | 0.903 (0.807–1.011) | 0.08 | – | – | |
| Site | |||||
| Left colon | Reference | Reference | |||
| Right colon | 1.464 (1.328–1.614) | <0.001 | 1.336 (1.206–1.480) | <0.001 | |
| Rectum | 0.878 (0.765–1.007) | 0.06 | 0.993 (0.841–1.174) | 0.94 | |
| Grade | |||||
| I | Reference | Reference | |||
| II | 1.090 (0.864–1.376) | 0.47 | 1.068 (0.846–1.349) | 0.58 | |
| III | 1.622 (1.267–2.077) | <0.001 | 1.437 (1.120–1.844) | 0.004 | |
| IV | 2.069 (1.508–2.838) | <0.001 | 1.824 (1.325–2.510) | <0.001 | |
| AJCC T stage | |||||
| Other | Reference | – | – | ||
| T1 | 1.056 (0.554–2.013) | 0.87 | – | – | |
| T2 | 1.070 (0.609–1.882) | 0.81 | – | – | |
| T3 | 1.137 (0.683–1.893) | 0.62 | – | – | |
| T4 | 1.673 (1.003–2.793) | 0.050 | – | – | |
| AJCC N stage | |||||
| N0 | Reference | Reference | |||
| N1 | 1.207 (1.059–1.375) | 0.005 | 1.052 (0.916–1.208) | 0.48 | |
| N2 | 1.681 (1.477–1.913) | <0.001 | 1.178 (0.971–1.430) | 0.10 | |
| Other | 1.982 (0.883–4.446) | 0.10 | 2.794 (1.233–6.333) | 0.71 | |
| CEA level | |||||
| Negative | Reference | Reference | |||
| Positive | 1.414 (1.237–1.616) | <0.001 | 1.424 (1.245–1.629) | <0.001 | |
| Other | 1.393 (1.200–1.617) | <0.001 | 1.362 (1.173–1.583) | <0.001 | |
| Perineural invasion | |||||
| No | Reference | Reference | |||
| Yes | 1.230 (1.109–1.364) | <0.001 | 1.084 (0.975–1.206) | 0.14 | |
| Other | 0.865 (0.722–1.035) | 0.11 | 0.865 (0.721–1.037) | 0.12 | |
| Tumor deposits | |||||
| ≥1 | Reference | Reference | |||
| 0 | 0.736 (0.659–0.820) | <0.001 | 0.750 (0.670–0.840) | <0.001 | |
| Tumor size (mm) | |||||
| >49 | Reference | – | – | ||
| ≤49 | 0.957 (0.873–1.050) | 0.35 | – | – | |
| Other | 0.819 (0.648–1.034) | 0.09 | – | – | |
| Regional nodes examined | |||||
| >16 | Reference | Reference | |||
| ≤16 | 1.218 (1.113–1.334) | <0.001 | 1.344 (1.225–1.475) | <0.001 | |
| Regional nodes positive | |||||
| >2 | Reference | Reference | |||
| ≤2 | 0.662 (0.604–0.725) | <0.001 | 0.786 (0.677–0.913) | 0.002 | |
| Liver metastases surgical | |||||
| No | Reference | Reference | |||
| Yes | 0.587 (0.528–0.652) | <0.001 | 0.627 (0.563–0.698) | <0.001 | |
| Radiation recode | |||||
| No | Reference | Reference | |||
| Yes | 0.687 (0.589–0.801) | <0.001 | 0.877 (0.725–1.061) | 0.18 | |
| Other | 0.915 (0.380–2.201) | 0.84 | 0.951 (0.390–2.316) | 0.91 | |
| No. of primary tumors | |||||
| ≥2 | Reference | – | – | ||
| 1 | 0.952 (0.849–1.067) | 0.40 | – | – | |
AJCC, American Joint Committee on Cancer; CEA, carcinoembryonic antigen; CI, confidence interval; HR, hazard ratio; N, node; T, tumor.
Kaplan-Meier survival curves (Figure S1A-S1J) revealed significant prognostic disparities: older patients (≥67 years) had markedly poorer survival (Figure S1A). White patients exhibited lower mortality risk than Black patients (HR =0.829, 95% CI: 0.727–0.945; Figure S1B). Unmarried patients had a higher mortality risk than their married counterparts (HR =1.243, 95% CI: 1.133–1.363; Figure S1C). Rectal primaries showed the lowest mortality risk, while right colon primaries had the highest (Figure S1D). Poorly differentiated tumors (grade IV) conferred the highest risk (HR =1.824, 95% CI: 1.325–2.510; Figure S1E). CEA positivity and ≥1 tumor deposit significantly increased mortality risk (Figure S1F,S1G). Patients with >16 lymph nodes examined and ≤2 positive nodes had a superior prognosis (Figure S1H,S1I). Liver metastasis resection was protective (HR =0.627, 95% CI: 0.563–0.698; Figure S1J).
Nomogram development and performance
The nomogram integrated these 10 variables to predict 1-, 3-, and 5-year OS (Figure 3). Discrimination was robust, with AUCs of 0.729 (95% CI: 0.699–0.758), 0.710 (95% CI: 0.689–0.731), and 0.714 (95% CI: 0.691–0.738) for 1-, 3-, and 5-year OS in the training cohort, respectively (Figure 4A-4C). Validation cohort AUCs were 0.717 (95% CI: 0.675–0.759), 0.736 (95% CI: 0.704–0.768), and 0.737 (95% CI: 0.702–0.772) (Figure 4D-4F). Calibration curves demonstrated close alignment between predicted and observed survival probabilities (Figure 5). Internal validation via bootstrap resampling showed minimal optimism in the C-index (training cohort: 0.657 vs. bootstrap-corrected: 0.657), indicating model stability. Our proposed model was compared with the established prognostic discrimination models based on Fong Clinical Risk Score, Basingstoke index and TNM staging system. The C-index of each model were as follows: our model: 0.657 (training cohort), 0.660 (validation cohort), 0.657 (Bootstrap mean); Fong clinical risk score: 0.610 (training cohort), 0.658 (validation cohort), 0.609 (Bootstrap mean); Basingstoke index: 0.558 (training cohort), 0.583 (validation cohort), 0.558 (Bootstrap mean); TNM staging system: 0.629 (training cohort), 0.617 (validation cohort), 0.629 (Bootstrap mean), suggesting that our model had higher accuracy than Fong clinical risk score/Basingstoke index/TNM staging system (P<0.001, Table 3).
Table 3
| Model | Training cohort | Validation cohort | Bootstrap C index (95% CI) | P value vs. Cox Model |
|---|---|---|---|---|
| Cox proportional hazards model | 0.657 | 0.659 | 0.657 (0.643–0.670) | Reference |
| Fong clinical risk score | 0.610 | 0.658 | 0.609 (0.591–0.628) | <0.001 |
| Basingstoke index | 0.558 | 0.583 | 0.558 (0.540–0.576) | <0.001 |
| TNM staging system | 0.629 | 0.617 | 0.629 (0.604–0.654) | <0.001 |
CI, confidence interval; TNM, tumor-node-metastasis.
Clinical utility and risk stratification
DCA confirmed net benefit across thresholds of 5–99% (Figure 6). Patients were stratified into low- [<220], intermediate- [220–301], and high-risk [>301] groups, with distinct survival curves (5-year OS: 48% vs. 28% vs. 12%; P<0.001; Figure 7). The web-based calculator (https://lxt134520.shinyapps.io/output/) enabled individualized risk prediction (Figure 8).
Discussion
This study developed and validated a prognostic nomogram that incorporated ten multivariate-identified prognostic factors—including demographic characteristics (age, race, marital status), tumor-related features (primary site, histological grade, CEA status, tumor deposits, regional lymph node metrics), and treatment-related factors (liver metastasis resection)—to predict OS in CRCLM patients undergoing primary tumor resection with chemotherapy. ROC analysis yielded AUC values >0.7 for 1-, 3-, and 5-year OS in both training and validation cohorts, indicating high predictive accuracy. Calibration curves closely aligned with ideal predictions, and DCA showed superior net benefit compared to “None” or “All” strategies, confirming the model’s clinical utility. Notably, the model outperformed traditional TNM staging (C-index: 0.629 vs. 0.657, P<0.001) through its incorporation of multifactorial predictors (16). Patients were stratified into low-risk [<220], moderate-risk [220–301], and high-risk [>301] groups based on total nomogram scores. Kaplan-Meier curves demonstrated significantly higher survival rates in the low-risk group. An online calculator (https://lxt134520.shinyapps.io/output/) based on the nomogram offers clinicians a practical tool for individualized risk prediction and treatment planning, supporting its integration into routine clinical decision-making. For example, high-risk patients (score >301) with a 5-year OS of 12% could be prioritized for experimental therapies or palliative care integration. However, the model’s clinical utility must be contextualized within critical limitations inherent to the SEER database and methodological constraints.
Older age was associated with increased mortality, likely due to age-related comorbidities (e.g., COPD, diabetes, cardiovascular disease) and immunosenescence (17). Patients aged 17–54 years had an HR (95% CI) of 0.741 (0.660–0.833) compared to those aged ≥67 years, while those aged 55–66 years had an HR of 0.761 (0.681–0.850), supporting a negative correlation between age and OS. Marital status also significantly influenced survival. Married individuals constituted 58.672% of the cohort, with an OS HR of 1.243 (1.133–1.363) for unmarried patients, possibly reflecting the psychological and treatment adherence benefits of spousal support (18,19). Our nomogram analysis further confirmed lower mortality risk scores among married patients. Race was another key factor; Black patients exhibited a 21% higher mortality risk than White patients (HR =1.21), consistent with prior SEER-based studies (20). However, this association likely reflects systemic inequities—including disparities in healthcare access, treatment delays, and comorbidities—rather than biological determinants. The SEER database lacks granular socioeconomic data (e.g., insurance status, neighborhood deprivation), limiting our ability to disentangle these confounders (21,22). Future models should replace race with direct measures of socioeconomic disadvantage to avoid perpetuating biased clinical interpretations (23). Regarding tumor location, right-sided colon tumors had the highest mortality risk, consistent with findings by Brouwer et al. demonstrating poorer survival among patients with right-sided primary tumors, regardless of metastatic site (24). Histologic grade also influenced OS. Moderately differentiated tumors accounted for 73.339% of cases, and increasing tumor grade was associated with elevated mortality risk, consistent with previous studies (25). Tumor deposits, often indicative of aggressive disease, were significantly associated with poor outcomes and may warrant inclusion in future staging systems (26). Lymph node metrics also played a critical role. Both the number of positive lymph nodes and the number examined were independent prognostic factors (27). Higher positive node counts increased recurrence risk, while a greater number examined was protective—likely reflecting better surgical quality (28). CEA positivity, found in 58.702% of patients, was another independent risk factor for OS (HR =1.424; 95% CI: 1.245–1.629), aligning with prior research on its prognostic utility (29). Furthermore, liver metastasis resection significantly improved survival, with an HR of 0.627 (0.563–0.698), supporting surgical intervention as a key strategy (30).
The AJCC TNM staging system, while foundational, inadequately addresses the heterogeneity of CRCLM outcomes. Our nomogram complemented TNM staging by integrating tumor biology (e.g., grade, CEA levels), treatment factors (e.g., lymph node dissection extent, liver surgery), and socioeconomic proxies (e.g., marital status, race). Notably, right-sided tumors conferred a 34% higher mortality risk than left-sided lesions (HR =1.34) (24), aligning with prior reports of their aggressive biology and resistance to anti-EGFR therapies (31). Similarly, liver metastasis resection halved mortality risk (HR =0.63), underscoring its curative potential even in metastatic settings (32). The Fong Clinical Risk Score, which incorporated seven independent prognostic factors (positive margin, extrahepatic disease, >1 tumor, CEA >200 ng/mL, size >5 cm, node-positive primary, and disease-free interval <12 months), classified patients into six discrete groups (0–5 points). However, substantial prognostic heterogeneity persisted within the high-risk subgroup (≥3 points), as evidenced by the divergent 5-year survival rates (20% vs. 14% for 3 vs. 5 points, respectively) (33). In contrast, our model addressed this limitation by implementing continuous risk scoring, enabling finer stratification that specifically facilitates identification of ‘intermediate-risk’ patients who may benefit from neoadjuvant therapy. The Basingstoke Predictive Index, which included seven validated risk factors as independent predictors of poor survival (number of hepatic metastases >3, node-positive primary, poorly differentiated primary, extrahepatic disease, tumor diameter ≥5 cm, CEA level >60 ng/mL, and positive resection margin), required postoperative pathological data (e.g., resection margin status) to complete comprehensive scoring. This fundamental design limitation significantly restricted its utility in preoperative decision-making. Furthermore, the index failed to incorporate modern treatment factors (e.g., response to neoadjuvant therapy) and maintained a relatively high CEA threshold (>60 ng/mL), potentially leading to underdetection of high-risk patients who might benefit from intensive therapies (34). In contrast, our model was based on contemporary multicenter data from SEER and other sources, utilizing a Cox proportional hazards model to more flexibly characterize risk profiles. It achieved individualized prediction using only preoperative available indicators, significantly enhancing its clinical applicability for early-stage therapeutic planning. These findings highlighted the need for dynamic prognostic tools that reflect modern multidisciplinary management.
While this study developed a clinically applicable prognostic nomogram for CRCLM patients, several limitations must be acknowledged. The model’s applicability was specifically designed for CRCLM patients undergoing primary tumor resection combined with chemotherapy, and extending its application to transplant or palliative cases would require additional research, representing an important direction for future studies. First, the SEER database lacked critical treatment details [chemotherapy regimens [e.g., FOLFOX (fluorouracil, leucovorin, and oxaliplatin)/FOLFIRI (fluorouracil, leucovorin, and irinotecan)], targeted therapies (e.g., bevacizumab/cetuximab), and molecular markers (KRAS/BRAF/MSI)] that significantly influence contemporary oncology outcomes—particularly relevant given the known poor prognosis of BRAF-mutated CRCLM, which our model could not specifically address. The main limitations concerned the lack of information on chemotherapy treatment and response (using Response Evaluation Criteria In Solid Tumors 1.1 guidelines) and its molecular profile. Second, while race emerged as a prognostic variable, it likely served as a proxy for systemic healthcare disparities (access barriers, treatment delays, socioeconomic inequities) rather than biological factors, highlighting the need to incorporate direct socioeconomic measures (insurance status, income levels) in future models. Third, the exclusive use of pre-2015 U.S. SEER data limited generalizability to global populations and modern therapeutic eras (immunotherapy/novel targeted agents), necessitating external validation in multinational cohorts. Fourth, the static risk stratification (low/intermediate/high) might not adequately adapt to evolving treatment standards, suggesting the future integration of dynamic biomarkers like circulating tumor DNA (ctDNA). Finally, despite statistical adjustments, residual confounding (surgeon expertise, postoperative complications) inherent in the retrospective design underscored the need for prospective studies with standardized protocols. Collectively, addressing these limitations—through incorporation of treatment-specific data, equity-centered variables, international validation, dynamic biomarkers, and prospective designs—would enhance the model’s precision and clinical utility.
Conclusions
The nomogram developed in this study demonstrates robust predictive accuracy for OS in CRCLM patients undergoing primary tumor resection and chemotherapy, offering clinicians a practical tool for risk stratification and individualized decision-making. By integrating multifactorial predictors—including tumor biology, treatment factors, and socioeconomic proxies—this model advances beyond traditional TNM staging to address the heterogeneity of metastatic CRC outcomes. However, its clinical utility is constrained by inherent limitations of the SEER database, notably the absence of specific chemotherapy regimens (e.g., FOLFOX vs. FOLFIRI), molecular markers (e.g., KRAS/BRAF), and granular socioeconomic data. Racial disparities observed in the model likely reflect systemic healthcare inequities rather than biological differences, necessitating cautious interpretation.
To translate this tool into a precision oncology resource, future efforts must prioritize: (I) external validation in global cohorts with detailed treatment and molecular profiling data; (II) integration of dynamic biomarkers (e.g., ctDNA) to enable real-time risk recalibration; and (III) replacement of demographic proxies like race with direct measures of socioeconomic disadvantage. With these refinements, the nomogram may evolve into an adaptive platform for guiding therapeutic strategies in the era of targeted therapies and immunotherapy, ultimately improving equity and outcomes in CRCLM management.
Acknowledgments
We would like to acknowledge the Surveillance, Epidemiology, and End Results (SEER) database for its support. We are grateful to all the contributors for making the SEER data available for research.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-415/rc
Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-415/prf
Funding: This research was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-415/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
- Pei W, Li J, Lei S, et al. Burden of major cancers in China attributable to modifiable risk factors: Predictions from 2012 to 2035. Int J Cancer 2025;156:1369-79. [Crossref] [PubMed]
- Reboux N, Jooste V, Goungounga J, et al. Incidence and Survival in Synchronous and Metachronous Liver Metastases From Colorectal Cancer. JAMA Netw Open 2022;5:e2236666. [Crossref] [PubMed]
- Sone T, Murai T, Itaya K, et al. Influence of liver metastasis locations on overall survival in patients with colorectal cancer. Ann Oncol 2019;30:ix37-8.
- Patel RK, Rahman S, Schwantes IR, et al. Updated Management of Colorectal Cancer Liver Metastases: Scientific Advances Driving Modern Therapeutic Innovations. Cell Mol Gastroenterol Hepatol 2023;16:881-94. [Crossref] [PubMed]
- Kambakamba P, Hoti E, Cremen S, et al. The evolution of surgery for colorectal liver metastases: A persistent challenge to improve survival. Surgery 2021;170:1732-40. [Crossref] [PubMed]
- Zhang Y, Li J, Guan X, et al. Impact of lymph node metastasis on prognosis in colorectal cancer patients with liver metastasis and staging systems Refinement: An international multicenter retrospective cohort study. Eur J Surg Oncol 2025;51:110124. [Crossref] [PubMed]
- Lun Y, Yuan H, Ma P, et al. A prediction model based on random survival forest analysis of the overall survival of elderly female papillary thyroid carcinoma patients: a SEER-based study. Endocrine 2024;85:1252-60. [Crossref] [PubMed]
- Li YX, Mu BX, Zhou HJ, et al. Development and validation of nomograms for predicting overall survival and cancer-specific survival in unresected colorectal cancer patients undergoing chemotherapy. Sci Rep 2025;15:12477. [Crossref] [PubMed]
- Kong L, Yan C, Nie S, et al. Comparison of proximal and distal gastric neuroendocrine carcinoma based on SEER database. Sci Rep 2024;14:25956. [Crossref] [PubMed]
- Zhang Q. Baseline characteristics of the training cohort. figshare; 2024. Available online: https://doi.org/
10.6084/m9.figshare.26510554.v3 - Zhang Q. Baseline characteristics of the validation cohort. figshare; 2024. Available online: https://doi.org/
10.6084/m9.figshare.26520409.v2 - Wang SL, Chen CB, Huang YS, et al. Muscle-specific strength is an alternative to muscle mass and grip strength for predicting outcomes in patients with gastric cancer. Eur J Surg Oncol 2025;51:110229. [Crossref] [PubMed]
- Sonabend R, Király FJ, Bender A, et al. mlr3proba: an R package for machine learning in survival analysis. Bioinformatics 2021;37:2789-91. [Crossref] [PubMed]
- Van Calster B, Wynants L, Verbeek JFM, et al. Reporting and Interpreting Decision Curve Analysis: A Guide for Investigators. Eur Urol 2018;74:796-804. [Crossref] [PubMed]
- Pemaj X, Sina M, Prifti S, et al. P-234 Association between tumor differentiation grade and TNM staging in colorectal cancer. Ann Oncol 2022;33:S332.
- Xu J, Yin F, Ren L, et al. The risk factors of lymph node metastasis in early colorectal cancer: a predictive nomogram and risk assessment. Int J Colorectal Dis 2024;39:191. [Crossref] [PubMed]
- Cavalli-Björkman N, Qvortrup C, Sebjørnsen S, et al. Lower treatment intensity and poorer survival in metastatic colorectal cancer patients who live alone. Br J Cancer 2012;107:189-94. [Crossref] [PubMed]
- Daniels B, Luckett T, Liauw W, et al. Trajectories of Opioid Use Before and After Cancer Diagnosis: A Population-Based Cohort Study. J Pain Symptom Manage 2024;68:282-291.e11. [Crossref] [PubMed]
- Lee J, Jensen C, Kang J, et al. Survival by race and ethnicity among insured patients with early-onset colorectal cancer. J Clin Oncol 2024;42:18.
- Zhao J, Han X, Nogueira L, et al. Health insurance status and cancer stage at diagnosis and survival in the United States. CA Cancer J Clin 2022;72:542-60. [Crossref] [PubMed]
- Jansen L, Erb C, Nennecke A, et al. Socioeconomic deprivation and cancer survival in a metropolitan area: An analysis of cancer registry data from Hamburg, Germany. Lancet Reg Health Eur 2021;4:100063. [Crossref] [PubMed]
- Hadad MJ, Rullán-Oliver P, Grits D, et al. Racial Disparities in Outcomes After THA and TKA Are Substantially Mediated by Socioeconomic Disadvantage Both in Black and White Patients. Clin Orthop Relat Res 2023;481:254-64. [Crossref] [PubMed]
- Brouwer NPM, van der Kruijssen DEW, Hugen N, et al. The Impact of Primary Tumor Location in Synchronous Metastatic Colorectal Cancer: Differences in Metastatic Sites and Survival. Ann Surg Oncol 2020;27:1580-8. [Crossref] [PubMed]
- Nie P, Zhao X, Ma J, et al. Can the preoperative CT-based deep learning radiomics model predict histologic grade and prognosis of chondrosarcoma? Eur J Radiol 2024;181:111719. [Crossref] [PubMed]
- Abbas A, Chu DI. Tumor Deposits-A Blind Spot in Colon Cancer Staging. JAMA Surg 2025;160:414. [Crossref] [PubMed]
- Siegel RL, Giaquinto AN, Jemal A. Cancer statistics, 2024. CA Cancer J Clin 2024;74:12-49. [Crossref] [PubMed]
- Mabeta P, Hull R, Dlamini Z. LncRNAs and the Angiogenic Switch in Cancer: Clinical Significance and Therapeutic Opportunities. Genes (Basel) 2022;13:152. [Crossref] [PubMed]
- Rao H, Wu H, Huang Q, et al. Clinical Value of Serum CEA, CA24-2 and CA19-9 in Patients with Colorectal Cancer. Clin Lab 2021;
- Emile SH, Horesh N, Garoufalia Z, et al. Resection of primary colon cancer with and without resection of liver metastases: Propensity-score matched analysis. J Clin Oncol 2023;41:e15580.
- Liu Y, Zhou S, Chen X, et al. 262MO Multi-omics signature for identification of RAS wild-type colorectal cancer liver metastases sensitive to anti-EGFR therapy. Ann Oncol 2022;33:S1535.
- Milazzo M, Todeschini L, Caimano M, et al. Surgical Resection in Colorectal Liver Metastasis: An Umbrella Review. Cancers (Basel) 2024;16:1849. [Crossref] [PubMed]
- Fong Y, Fortner J, Sun RL, et al. Clinical score for predicting recurrence after hepatic resection for metastatic colorectal cancer: analysis of 1001 consecutive cases. Ann Surg 1999;230:309-18; discussion 318-21. [Crossref] [PubMed]
- Rees M, Tekkis PP, Welsh FK, et al. Evaluation of long-term survival after hepatic resection for metastatic colorectal cancer: a multifactorial model of 929 patients. Ann Surg 2008;247:125-35. [Crossref] [PubMed]



