Multi-scale deep learning models based on MRI for predicting pathological differentiation and evaluating its association with recurrence-free survival in hepatocellular carcinoma: an explainable machine learning study

Xue-Yong Zuo; Hai-Feng Liu

doi:10.21037/jgo-2025-aw-928

Original Article

Multi-scale deep learning models based on MRI for predicting pathological differentiation and evaluating its association with recurrence-free survival in hepatocellular carcinoma: an explainable machine learning study

Xue-Yong Zuo¹, Hai-Feng Liu²

¹Department of Gastroenterology, The Third Affiliated Hospital of Soochow University, Changzhou, China; ²Department of Radiology, The Third Affiliated Hospital of Soochow University, Changzhou, China

Contributions: (I) Conception and design: Both authors; (II) Administrative support: XY Zuo; (III) Provision of study materials or patients: Both authors; (IV) Collection and assembly of data: Both authors; (V) Data analysis and interpretation: Both authors; (VI) Manuscript writing: Both authors; (VII) Final approval of manuscript: Both authors.

Correspondence to: Xue-Yong Zuo, MM. Department of Gastroenterology, The Third Affiliated Hospital of Soochow University, No. 185 Juqian Street, Changzhou 213000, China. Email: zuoxueyong@outlook.com.

Background: Pathological differentiation is a critical prognostic indicator of biological behavior in hepatocellular carcinoma (HCC). The aim of the study was to develop multi-scale deep learning (DL) models based on magnetic resonance imaging (MRI) for predicting pathological differentiation and its association with recurrence-free survival (RFS) in HCC.

Methods: A cohort of 292 patients with HCC was included and randomly assigned to a training set (TS) (n=204) and a validation set (VS) (n=88). DL models, including 2-dimensional (2D), 2.5-dimensional (2.5D), and 3-dimensional (3D), were trained by the ResNet50 network and developed using the eXtreme Gradient Boosting (XGBoost) classifier. The performance of these multi-scale DL models in predicting poorly-differentiated HCC (pdHCC) was evaluated using area under the curve (AUC). The SHapley Additive exPlanations (SHAP) method was applied to interpret the optimal DL models.

Results: The 2.5D model based on MRI achieved the highest AUC value of 0.91 [95% confidence interval (CI): 0.87–0.95] and 0.86 (95% CI: 0.58–1.00) for the prediction of pdHCC in the TS and VS. This outperformed both the 2D (AUC =0.88 and 0.84) and 3D (AUC =0.83 and 0.64) models. Additionally, cases predicted as pdHCC by our developed MRI2.5D model demonstrated significantly lower RFS values compared to non-pdHCC cases (25 vs. 50 months, P=0.006). The SHAP approach highlighted the weighted importance of DL features, providing insightful interpretation within the MRI2.5D model for predicting pdHCC.

Conclusions: The MRI2.5D model demonstrated superior capability for predicting pathological differentiation and its association with RFS in HCC, serving as a valuable tool for treatment decision-making in patients with HCC.

Keywords: Deep learning (DL); hepatocellular carcinoma (HCC); magnetic resonance imaging (MRI); pathological differentiation; SHapley Additive exPlanations (SHAP)

Submitted Nov 12, 2025. Accepted for publication Jan 20, 2026. Published online Feb 26, 2026.

doi: 10.21037/jgo-2025-aw-928

Highlight box

Key findings

• The developed multi-scale deep learning (DL) models based on magnetic resonance imaging (MRI) demonstrated superior performance in predicting differentiation and its association with recurrence-free survival (RFS) in patients with hepatocellular carcinoma (HCC), especially for the MRI2.5D model. The arterial phase-based DL model outperformed the T2-weighted imaging-based model for predicting HCC differentiation. SHapley Additive exPlanations (SHAP) analysis provided an insightful and visualized interpretation for the DL model in predicting poorly-differentiated HCC.

What is known, and what is new?

• Previous studies have explored the performance of 2-dimensional DL models based on MRI or other imaging modalities for predicting HCC differentiation. However, this approach inadequately captures the heterogeneity of the entire HCC, and the underlying processes of DL models in the prediction of HCC differentiation remain a black box. Moreover, the specific association between DL models and RFS in patients with various pathological differentiation has not been documented.

• This study firstly developed and validated multi-scale DL models that utilize MRI maps to predict pathological differentiation and RFS in patients with HCC, and SHAP analysis was employed to provide an interpretable analysis of the model’s predictions.

What is the implication, and what should change now?

• The MRI2.5D DL model serves as an effective tool for predicting pathological differentiation and its association with RFS in patients with HCC. Prospective and multi-center studies are necessary to validate the efficacy of our developed multi-scale DL model in predicting differentiation and RFS in patients with HCC.

Introduction

Hepatocellular carcinoma (HCC) represents the most prevalent subtype of primary liver cancer and is among the leading causes of cancer-related mortality globally (1,2). Despite advancements in preoperative detection and curative therapies, the prognosis for HCC remains unfavorable, with a 5-year recurrence rate ranging from 50% to 70% (3-5). Pathological differentiation is a critical prognostic indicator of HCC, as poorly-differentiated HCC (pdHCC) is closely associated with rapid disease progression and early vascular invasion, resulting in higher recurrence rates and poorer survival outcomes compared to well-differentiated HCC (wHCC) and moderately-differentiated HCC (mHCC) (6). For HCC exhibiting aggressive features, extensive resection margin and adjuvant therapies are recommended (7,8). Therefore, the accurate prediction of HCC differentiation is essential for optimal therapy stratification and improving prognosis. Liver biopsy remains the gold standard for diagnosing HCC differentiation. However, its invasive nature, potential risks of sampling error, and seeding metastasis inevitably limit its clinical utility (9). Moreover, contemporary guidelines allow the diagnosis of HCC without the necessity of a biopsy (10), thereby underscoring the demand for non-invasive methods of biological characterization.

The deep learning (DL) approach is increasingly utilized for the preoperative detection and prognosis prediction of HCC by extracting numerous signatures (11-13), offering promising applications for noninvasively evaluating tumor heterogeneity and aggressive characteristics. Several previous studies have explored the performance of 2-dimensional (2D) DL models based on magnetic resonance imaging (MRI) or other imaging modalities for predicting HCC differentiation (14-17). However, these studies only developed 2D DL models based on the maximum cross-section. This approach inadequately captures the heterogeneity of the entire HCC, potentially reducing the generalizability and reliability of DL models. Moreover, the underlying processes of DL models in the prediction of HCC differentiation remain a black box, and this lack of interpretability undermines their credibility and prevents clinicians from validating the DL models. Previous research primarily focused on investigating the efficacy of DL models in predicting HCC differentiation. However, the specific association between DL models and recurrence-free survival (RFS) with various pathological differentiation has not been documented.

Recent studies demonstrated that 2.5-dimensional (2.5D) and 3-dimensional (3D) DL models enhanced predictive performance (18,19), by not only offering comprehensive heterogeneity characteristics but also providing multidimensional DL features. However, to date, there have been no reports on the use of multi-scale DL models based on MRI for predicting pathological differentiation and its association with RFS in patients with HCC. SHapley Additive exPlanations (SHAP), a game-theoretic approach, effectively weights the ranking importance of enrolled features and interprets the prediction outcomes, thus transforming the DL model from an opaque black box into a transparent decision-support tool (20,21).

The purpose of this study was to establish multi-scale DL models based on MRI to concurrently predict the pathological differentiation and its association with RFS in patients with HCC. Additionally, SHAP analysis is employed to provide an interpretable analysis of the model’s predictions. Ultimately, this research endeavors to establish a noninvasive transparent multi-scale DL model to inform personalized therapeutic strategies for patients with HCC. We present this article in accordance with the TRIPOD reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-aw-928/rc).

Methods

Study design and participant selection

This retrospective study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Ethics Committee of The Third Affiliated Hospital of Soochow University (No. 2022-CL027-01), and written informed consent for this retrospective analysis was waived. Initially, the single-center study included 325 patients from The Third Affiliated Hospital of Soochow University, who underwent preoperative liver MRI examination and were subsequently diagnosed with HCC by histopathological examination following hepatectomy, between January 2017 and May 2025. Exclusion criteria included the following: (I) recurrent HCC (n=14); (II) an interval exceeding two months between MRI scanning and hepatectomy (n=2); (III) anti-HCC treatment prior to MRI examination (n=4); (IV) incomplete pathological reports for HCC differentiation (n=3); (V) severe MRI artifacts (n=3); and (VI) missing follow-up data (n=7). Ultimately, 292 consecutive patients were enrolled in this study, randomly allocated to either a training set (TS) or a validation set (VS) at a 7:3 ratio.

MRI protocol and pathological differentiation

MRI examinations were conducted using a 3.0-Tesla MRI scanner (Verio/VIDA, Siemens, Germany; Ingenia, Philips, The Netherlands) equipped with a phased-array abdominal coil. The MRI protocol included T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI), and contrast-enhanced MRI (CEMRI). For CEMRI, a dose of 0.1 mmol/kg of gadopentetic acid (Gd-DTPA) (Magnevist, Beilu, China) was administered intravenously at a rate of 1 mL/s. Detailed MRI protocols are provided in Table S1. To ensure consistency across images, which may have varied due to the different MRI scanners and parameters, the images underwent processing. This included Z-score normalization, N4-bias field correction, and resampling to a uniform voxel size of 1.0 mm × 1.0 mm × 1.0 mm.

Resected specimens were processed for routine hematoxylin and eosin staining to evaluate pathological differentiation. The evaluation was performed by a well-trained pathologist with over 15 years of expertise, who was unaware of the preoperative data. According to the World Health Organization (WHO) diagnostic criteria, HCC differentiation was classified as wHCC, mHCC, or pdHCC based on the morphological and biological characteristics of cancer cells. In cases where multiple differentiation results were observed, the predominant differentiation served as the definitive diagnosis of HCC.

Tumor segmentation and multi-scale DL feature extraction

A radiologist (H.F.L.) with 9 years of experience and a gastroenterologist (X.Y.Z.) with 12 years of experience, who was unaware of clinical and pathological data, manually delineated each layer of the HCC along its boundaries on T2WI and arterial phase (AP) maps using ITK-SNAP software (version 3.6.0). This process ultimately created a 3D region of interest (ROI) that encompassed the entire HCC. Both researchers were unaware of pathological findings, and any discrepancies were resolved through consensus discussion. More specifically, only the largest tumor was delineated in cases with multiple HCCs.

The maximum tumor cross-section cropped from the ROI was utilized as the input image for the 2D DL model. For the 2.5D DL model, the input comprised the largest HCC cross-section and its adjacent upper and lower slices (±1 slice, 3 slices). In contrast, the 3D DL model utilized the bounding cube of the ROI as its input image. These cropped images were resized to a standardized pixel size of 224×224 using nearest neighbor interpolation. A 2D convolutional neural network (CNN), specifically 2D ResNet50, was applied to train 2D and 2.5D models, while 3D ResNet50 was utilized for the 3D model. Multi-scale DL models were pre-trained on the ImageNet dataset (22,23) followed by transfer learning on the 204 HCC MRI images. After completion of the training process, the ResNet50 was used to extract 2,048 DL features of each patch from the penultimate average pooling layer.

Multi-scale DL model development and assessment

A four-step methodology was sequentially implemented to identify optimal features and establish a multi-scale DL model based on a single AP or T2WI. Initially, the Z-score method was employed to normalize the extracted DL features. Subsequently, Spearman correlation analysis was utilized to remove redundant features with an absolute correlation coefficient below 0.9. The least absolute shrinkage and selection operator (LASSO) regression with a 10-fold cross-validation approach was employed to determine the most robust DL features with non-zero coefficients. Finally, the classifier of eXtreme Gradient Boosting (XGBoost) was used to establish multi-scale DL models for the prediction of pdHCC. Furthermore, the identified features extracted from T2WI and AP were integrated to establish an MRI-based DL model.

The performance of DL models was evaluated using the area under the curve (AUC), along with metrics such as accuracy, sensitivity, specificity, positive predictive value, and negative predictive value. The diagnostic difference among various predictive models was compared using the DeLong test. In addition, the consistency and clinical utility of models were assessed using a calibration curve and decision curve analysis (DCA), respectively.

Follow-up and RFS assessment

Post-hepatectomy surveillance for HCC recurrence was conducted at regular intervals of 3 to 6 months using MRI or computed tomography combined with serological results. Recurrent HCC was defined as the emergence of new intrahepatic lesions. The diagnosis was confirmed by representative radiological findings or pathological results obtained from a second surgical intervention. Patients were censored at the time of first recurrence or the final follow-up on July 31, 2025. RFS was defined as the interval from the initial hepatectomy to either the occurrence of HCC recurrence or the last follow-up.

SHAP assessment

The SHAP method was utilized to interpret the most optimal DL model and to quantify the contribution of each identified DL feature to the prediction of pdHCC. The bee swarm plot illustrates the global distribution of SHAP values for all features. This visualization helps to determine the ranking importance of features according to weighted predictions. In the SHAP bee swarm plot, a higher SHAP value indicated an increased likelihood of pdHCC. The overall workflow from multi-scale ROI delineation to SHAP analysis is illustrated in Figure 1.

Figure 1 Detailed process from multi-scale ROI acquisition to RFS assessment. 2D, 2-dimensional; 2.5D, 2.5-dimensional; 3D, 3-dimensional; AP, arterial phase; Conv, convolutional; DL, deep learning; FC, fully-connected; HCC, hepatocellular carcinoma; LASSO, least absolute shrinkage and selection operator; mHCC, moderately-differentiated HCC; pdHCC, poorly-differentiated HCC; RFS, recurrence-free survival; ROC, receiver operating characteristic; ROI, region of interest; SHAP, SHapley Additive exPlanations; T2WI, T2-weighted imaging; wHCC, well-differentiated HCC; XGBoost, eXtreme Gradient Boosting.

Statistical analysis

Statistical analysis was performed using R software (version 4.0). Differences in variables between TS and VS were compared using the Chi-squared test or the Mann-Whitney U test. The Kaplan-Meier method was used to construct RFS curves, and the log-rank test was applied to evaluate differences in RFS values. A P value <0.05 was considered to indicate statistical significance.

Results

Baseline characteristics

A total of 292 eligible patients were included and randomly divided into the VS (n=88) and the TS (n=204). The patient cohort consisted of 244 males and 48 females, with an age range of 29 to 85 years (mean age, 63.05±10.17 years). Among these patients, 15 (5.14%) were diagnosed with wHCC, 194 (66.44%) were confirmed as mHCC, and 83 (28.42%) were confirmed as pdHCC. No significant differences were shown between TS and VS regarding demographic information, serological results, clinical factors, or pathological results, as detailed in Table 1.

Table 1

Baseline information of included patients diagnosed with HCC

Variables	Total patients (n=292)	VS (n=88)	TS (n=204)	P value
Age (years)	63.05±10.17	63.35±10.18	62.92±10.19	0.66
ALT (U/L)	37.09±33.00	33.59±25.18	38.60±35.80	0.28
AST (U/L)	38.87±23.92	38.33±20.39	39.10±25.33	0.86
TB (μmol/L)	16.39±11.33	15.85±7.91	16.62±12.53	0.98
Tumor size (cm)	4.90±2.98	4.67±2.92	5.01±3.00	0.38
Gender				>0.99
Female	48 (16.44)	15 (17.05)	33 (16.18)
Male	244 (83.56)	73 (82.95)	171 (83.82)
Cirrhosis				0.89
No	209 (71.58)	62 (70.45)	147 (72.06)
Yes	83 (28.42)	26 (29.55)	57 (27.94)
AFP (ng/mL)				0.43
≤20	163 (55.82)	49 (55.68)	114 (55.88)
>20	129 (44.18)	39 (44.32)	90 (44.12)
Etiology				0.19
None	80 (27.40)	19 (21.59)	61 (29.90)
HBV/HCV	212 (72.60)	69 (78.41)	143 (70.10)
BCLC stage				0.41
0–A	260 (89.04)	76 (86.36)	184 (90.20)
B–C	32 (10.96)	12 (13.64)	20 (9.80)
Child-Pugh grade				>0.99
A	272 (93.15)	82 (93.18)	190 (93.14)
B	20 (6.85)	6 (6.82)	14 (6.86)
PVTT				>0.99
No	278 (95.21)	84 (95.45)	194 (95.10)
Yes	14 (4.79)	4 (4.55)	10 (4.90)
HCC number				0.67
Solitary	270 (92.47)	80 (90.91)	190 (93.14)
Two	22 (7.53)	8 (9.09)	14 (6.86)
HCC differentiation				0.93
wHCC	15 (5.14)	4 (4.55)	11 (5.39)
mHCC	194 (66.44)	58 (65.91)	136 (66.67)
pdHCC	83 (28.42)	26 (29.54)	57 (27.94)

Data are presented as mean ± SD or n (%). AFP, alpha-fetoprotein; ALT, alanine aminotransferase; AST, aspartate aminotransferase; BCLC, Barcelona Clinic Liver Cancer; HBV, hepatitis B virus; HCC, hepatocellular carcinoma; HCV, hepatitis C virus; mHCC, moderately-differentiated HCC; pdHCC, poorly-differentiated HCC; PVTT, portal vein tumor thrombus; SD, standard deviation; TB, total bilirubin; TS, training set; VS, validation set; wHCC, well-differentiated HCC.

Multi-scale DL model development and performance

Following the sequential application of Spearman correlation analysis and LASSO regression, 14 DL features were identified for the establishment of the AP2D model, 14 for the T2WI2D model, and 12 for the MRI2D model. The MRI2D model demonstrated the highest AUC values of 0.88 [95% confidence interval (CI): 0.84–0.93] in the TS and 0.84 (95% CI: 0.69–0.99) in the VS for diagnosing pdHCC. Specifically, the AP2D model yielded AUCs of 0.82 and 0.85, while the T2WI2D model produced AUCs of 0.78 and 0.74, as detailed in Table 2 and illustrated in Figure 2A,2B.

Table 2

Value of multi-scale DL models for predicting HCC differentiation

DL scale	MRI sequence	No. of included features	AUC (95% CI)	Accuracy	Sensitivity	Specificity	PPV	NPV	Youden index
2D	AP (TS/VS)	14	0.82 (0.75–0.88)/0.85 (0.71–0.99)	0.76/0.82	0.74/1.00	0.76/0.71	0.58/0.69	0.87/1.00	0.34/0.33
	T2WI (TS/VS)	14	0.78 (0.71–0.84)/0.74 (0.55–0.93)	0.62/0.71	0.94/0.82	0.49/0.65	0.44/0.60	0.95/0.85	0.22/0.20
	MRI (TS/VS)	12	0.88 (0.84–0.93)/0.84 (0.69–0.99)	0.78/0.79	0.86/0.91	0.75/0.71	0.60/0.67	0.92/0.92	0.28/0.25
2.5D	AP (TS/VS)	8	0.85 (0.79–0.90)/0.86 (0.64–1.00)	0.75/0.83	0.94/0.83	0.66/0.83	0.55/0.83	0.96/0.83	0.29/0.30
	T2WI (TS/VS)	10	0.83 (0.77–0.89)/0.81 (0.54–1.00)	0.77/0.75	0.77/0.50	0.77/1.00	0.59/1.00	0.89/0.67	0.26/0.27
	MRI (TS/VS)	10	0.91 (0.87–0.95)/0.86 (0.58–1.00)	0.84/0.92	0.94/0.83	0.80/1.00	0.67/1.00	0.97/0.86	0.28/0.29
3D	AP (TS/VS)	6	0.74 (0.64–0.83)/0.67 (0.47–0.87)	0.63/0.66	0.83/1.00	0.57/0.41	0.38/0.55	0.91/1.00	0.32/0.34
	T2WI (TS/VS)	12	0.71 (0.60–0.82)/0.63 (0.41–0.84)	0.79/0.63	0.57/0.67	0.86/0.64	0.57/0.57	0.86/0.73	0.35/0.36
	MRI (TS/VS)	13	0.83 (0.75–0.91)/0.64 (0.44–0.85)	0.70/0.66	0.91/0.75	0.63/0.59	0.44/0.56	0.96/0.77	0.34/0.35

2D, 2-dimensional; 2.5D, 2.5-dimensional; 3D, 3-dimensional; AP, arterial phase; AUC, area under the curve; CI, confidence interval; DL, deep learning; HCC, hepatocellular carcinoma; MRI, magnetic resonance imaging; NPV, negative predictive value; PPV, positive predictive value; T2WI, T2-weighted imaging; TS, training set; VS, validation set.

Figure 2 The predictive performance of multi-scale DL models for predicting pdHCC in the TS and VS. (A) ROC curve of 2D model in the TS; (B) ROC curve of 2D model in the VS; (C) ROC curve of 2.5D model in the TS; (D) ROC curve of 2.5D model in the VS; (E) ROC curve of 3D model in the TS; (F) ROC curve of 3D model in the VS. 2D, 2-dimensional; 2.5D, 2.5-dimensional; 3D, 3-dimensional; AP, arterial phase; AUC, area under the curve; CI, confidence interval; DL, deep learning; HCC, hepatocellular carcinoma; MRI, magnetic resonance imaging; pdHCC, poorly-differentiated HCC; ROC, receiver operating characteristic; T2WI, T2-weighted imaging; TS, training set; VS, validation set.

A total of 8, 10, and 10 features were identified for establishing AP2.5D, T2WI2.5D, and MRI2.5D models, respectively. The MRI2.5D model demonstrated superior predictive capability for pdHCC, with AUC values of 0.91 (95% CI: 0.87–0.95) and 0.86 (95% CI: 0.58–1.00), compared to the AP2.5D (AUC =0.85 and 0.86) and T2WI2.5D (AUC =0.83 and 0.81) models. These comparative results are illustrated in Figure 2C,2D.

For discriminating HCC differentiation, the MRI3D DL model demonstrated an AUC value of 0.83 in the TS. In contrast, it exhibited relatively lower AUC value of 0.64 in the VS. The AP3D model yielded inferior AUCs (0.74 and 0.67) for the diagnosis of pdHCC, followed by the T2WI3D model (AUC =0.71 and 0.63), as illustrated in Figure 2E,2F. DeLong test results comparing AUC values across various multi-scale DL models are presented in Figures S1-S6.

Multi-scale DL model assessment

The calibration curve indicated that the MRI model exhibited superior consistency compared with T2WI and AP models across all multi-scale DL models. This superior performance was particularly evident in the MRI2.5D model, as depicted in Figure 3A-3C. Regarding DCA results, all DL models demonstrated higher net clinical benefits over a wide range of threshold probabilities. The 2.5D DL model revealed the most pronounced benefits, as illustrated in Figure 3D-3F.

Figure 3 The calibration curve and DCA of multi-scale DL models for predicting pathological differentiation of HCC in the TS. (A) The calibration curve of 2D model; (B) the calibration curve of 2.5D model; (C) the calibration curve of 3D model; (D) the DCA of 2D model; (E) the DCA of 2.5D model; (F) the DCA of 3D model. 2D, 2-dimensional; 2.5D, 2.5-dimensional; 3D, 3-dimensional; AP, arterial phase; DCA, decision curve analysis; DL, deep learning; HCC, hepatocellular carcinoma; MRI, magnetic resonance imaging; T2WI, T2-weighted imaging; TS, training set.

Follow-up results and RFS prediction

A total of 292 patients completed follow-up by July 2025. The follow-up durations ranged from 1 to 85 months with a median duration of 17 months. The median RFS was 46.00 months for patients with wHCC, 41.00 months for those with mHCC, and 35.00 months for those with pdHCC (log-rank P=0.04, Figure 4A). Additionally, patients who were predicted as pdHCC by our developed MRI2.5D model demonstrated a significantly lower median RFS (25 vs. 50 months, log-rank P=0.006, Figure 4B) compared to patients predicted as wHCC/mHCC.

Figure 4 RFS curves. (A) Pathological difference of HCC was proven as a significant predictor associated with RFS. (B) Our developed MRI2.5D model could serve as an effective tool for the prediction of pathological differentiation with its association with RFS in patients with HCC. 2.5D, 2.5-dimensional; CI, confidence interval; HCC, hepatocellular carcinoma; HR, hazard ratio; L, poorly-differentiated HCC; M, moderately-differentiated HCC; MRI, magnetic resonance imaging; RFS, recurrence-free survival; W, well-differentiated HCC; WM, well-differentiated HCC/moderately-differentiated HCC.

SHAP analysis interpretation

The SHAP bee swarm plot revealed that ten specific DL features were significantly associated with predicting pdHCC using the established 2.5D model. These comprised six features extracted from the AP sequence and four features from the T2WI sequence. Among these features, AD_DL_1507 emerged as the most critical predictor. This was closely followed by AD_DL_306 with the remaining eight DL features. Specifically, the features AD_DL_1507, AD_DL_823, T2WI_DL_219, and T2WI_DL_764 demonstrated a positive association with the prediction of pdHCC. Conversely, the remaining six features exhibited a negative association with the outcome, as illustrated in Figure 5.

Figure 5 The SHAP bee swarm plot illustrates the global distribution of SHAP values for each identified feature on the prediction of pdHCC using the XGBoost algorithm. The Y-axis ranks features by their importance from high to low. The X-axis represents the SHAP values, with colors ranging from blue to red indicating low to high SHAP values, corresponding to a negative or positive prediction of pdHCC. AD, arterial dynamic phase; DL, deep learning; HCC, hepatocellular carcinoma; pdHCC, poorly-differentiated HCC; SHAP, SHapley Additive exPlanations; T2WI, T2-weighted imaging; XGBoost, eXtreme Gradient Boosting.

Discussion

In this research, we developed and validated multi-scale DL models that utilize MRI maps to predict pathological differentiation and its association with RFS in patients with HCC. Our findings emphasize that the MRI2.5D model exhibits superior predictive capabilities for assessing HCC differentiation compared to 2D and 3D models. This superiority was observed irrespective of whether single-sequence or combined MRI data were used. Furthermore, AP-based models consistently outperformed T2WI-based models across various multi-scale DL models. Additionally, the SHAP approach provided interpretable explanations for the DL model’s predictions of HCC differentiation. Pathological differentiation was confirmed as an independent factor associated with RFS, and the MRI2.5D model may serve as an effective alternative for predicting RFS in patients with HCC following hepatectomy.

Compared to the T2WI model, the AP model demonstrated superior capability in distinguishing HCC differentiation across various multi-scale DL models. This underscores the advantage of using the AP model for characterizing heterogeneity, given that the hepatic artery was the primary blood supply to HCC (24,25). Furthermore, the MRI model that integrates AP and T2WI features revealed improved performance, highlighting the complementary benefits of using a biparametric MRI in predicting pdHCC (26). To develop the combined MRI model, only AP and T2WI sequences were utilized. This approach was chosen not only to establish a time-efficient and cost-effective tool for predicting HCC differentiation but also because the combination of T2WI and AP demonstrated performance comparable to more complex multi-modality MRI approaches in previous studies (27,28). In our study, the MRI2D model achieved AUC values of 0.88 and 0.84 for predicting pdHCC. These results align with the conclusions of Liu et al. (16), who developed a CEMRI-based 2D DL model.

The 3D model demonstrated comparatively lower AUC values than the 2D model in distinguishing HCC differentiation, regardless of whether single-sequence or combined MRI was utilized. Furthermore, the difference in AUCs between TS and VS was more pronounced for the 3D model, suggesting that the 3D DL model is ineffective and unstable. Similarly, Zhang et al. (29) and Liu et al. (30) reported that the 2D DL model exhibited superior performance and more reliable generalization in the prediction of early recurrence of HCC and visceral pleural invasion in T1 lung adenocarcinoma, respectively, when compared with the 3D DL model. To improve the generalization of DL models, several systematic techniques were employed, including N4-bias field correction, resampling algorithms, and pretraining on the ImageNet database. Nevertheless, the 3D model remains inadequate in predicting HCC differentiation. It is hypothesized that the cropped bounding cube contains excessive redundant spatial heterogeneity features unrelated to HCC. This subsequently reduces the predictive accuracy of the 3D model. Consequently, the development of a robust 3D model requires both an advanced 3D CNN training framework and improved methodology for enhancing generalization.

The 2.5D DL model has emerged as an innovative approach in clinical practice. It offers detailed insights into tumor heterogeneity while minimizing redundant features not associated with target lesions. Recent studies (18,31) indicated that the 2.5D DL model outperforms 2D and 3D DL models in predicting aggressive biological features and survival prognosis. In alignment with prior conclusions (18,19,31), our research emphasizes the enhanced performance and improved net clinical benefits of the 2.5D DL model in predicting the pathological differentiation of HCC, irrespective of whether single-sequence or combined MRI modalities are employed. Consequently, our findings suggest that the 2.5D DL model is a viable imaging procedure for achieving higher accuracy of aggressive biological characteristics.

A key innovation of this study was addressing the black box issue by interpreting the DL model using SHAP analysis. The SHAP bee swarm plot provides a clear, visual assessment for quantifying the ranked contribution of each identified DL feature in the global prediction of pdHCC, thereby providing a transparent decision-support tool for predicting HCC differentiation. Furthermore, SHAP analysis confirmed that the AP was more representative of HCC heterogeneity than T2WI, furtherly confirming that AP was surperior to T2WI for predicting the aggressive features of HCC. Our findings also demonstrated a significant decrease in RFS values with the deterioration of HCC differentiation, offering straightforward evidence for RFS prediction in patients with HCC. Notably, the developed MRI2.5D model offers a noninvasive alternative for predicting RFS values in patients with HCC with varying grades of pathological differentiation. Consequently, our study provides valuable insights into the prediction of RFS in patients with HCC across various pathological differentiation.

Several significant limitations should be addressed. First, the clinical and conventional radiomics models for predicting HCC differentiation were not analyzed, as their diagnostic performance had been extensively reported. Second, this study was a retrospective and single-center investigation without external validation, which may decrease the generalizability of the DL models. Third, the manual delineation could introduce bias due to interpretative variability among researchers. Future investigations concerning automatic segmentation approaches should be explored. Finally, the critically low ratio of events per variable and the high dimensionality of the initial 2048 DL feature space, combined with a relatively small sample size, contributed to a significant risk of overfitting. Consequently, the CI for the AUC value was exceptionally wide, particularly in the VS, despite the application of a sequential feature selection process and a 10-fold cross-validation approach to ensure robust model tuning. Therefore, multi-center studies with larger cohorts of enrolled patients are essential to validate the efficacy of our developed multi-scale DL models in predicting differentiation and its association with RFS in patients with HCC.

Conclusions

In conclusion, our study indicates that the developed MRI2.5D DL model serves as an effective tool for predicting pathological differentiation and its association with RFS in patients with HCC. Furthermore, the underlying mechanisms of the MRI2.5D model can be comprehensively explained using SHAP analysis.

Acknowledgments

We express our gratitude to the OnekeyAI platform for their valuable technical support throughout the course of this study. We would also like to thank Home-for-Researchers for their assistance in proofreading and enhancing the language of this paper.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-aw-928/rc

Data Sharing Statement: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-aw-928/dss

Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-aw-928/prf

Funding: This work was supported by Clinical Trials from The Third Affiliated Hospital of Soochow University (No. 2024-14) and Changzhou Science and Technology Program (No. CJ20244017).

Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-aw-928/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This retrospective study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Ethics Committee of The Third Affiliated Hospital of Soochow University (No. 2022-CL027-01), and written informed consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Hwang SY, Danpanichkul P, Agopian V, et al. Hepatocellular carcinoma: updates on epidemiology, surveillance, diagnosis and treatment. Clin Mol Hepatol 2025;31:S228-54. [Crossref] [PubMed]
Mak LY, Liu K, Chirapongsathorn S, et al. Liver diseases and hepatocellular carcinoma in the Asia-Pacific region: burden, trends, challenges and future directions. Nat Rev Gastroenterol Hepatol 2024;21:834-51. [Crossref] [PubMed]
Fuster-Anglada C, Mauro E, Ferrer-Fàbrega J, et al. Histological predictors of aggressive recurrence of hepatocellular carcinoma after liver resection. J Hepatol 2024;81:995-1004. [Crossref] [PubMed]
Maithel SK, Wang R, Harton J, et al. Prognostic Significance of Recurrence and Timing of Recurrence on Survival Among Patients with Early-Stage Hepatocellular Carcinoma in U.S. Clinical Practice. Ann Surg Oncol 2025;32:1054-62. [Crossref] [PubMed]
Espírito Santo J, Ladeirinha A, Alarcão A, et al. Hepatocellular carcinoma: tumor heterogeneity and recurrence after preoperative locoregional therapy. Med Oncol 2023;40:340. [Crossref] [PubMed]
Shinkawa H, Tanaka S, Kabata D, et al. The Prognostic Impact of Tumor Differentiation on Recurrence and Survival after Resection of Hepatocellular Carcinoma Is Dependent on Tumor Size. Liver Cancer 2021;10:461-72. [Crossref] [PubMed]
Al Farai A, Sangiuolo F, Albaali D, et al. The Definition of the Best Margin Cutoff and Related Oncological Outcomes After Liver Resection for Hepatocellular Carcinoma: A Systematic Review. Cancers (Basel) 2025;17:1759. [Crossref] [PubMed]
Vogel A, Grant RC, Meyer T, et al. Adjuvant and neoadjuvant therapies for hepatocellular carcinoma. Hepatology 2025;82:777-93. [Crossref] [PubMed]
Alwahaibi N, Alwahaibi M. Liver biopsy in the modern era: from traditional techniques to artificial intelligence and multi-omics integration. Front Med (Lausanne) 2025;12:1678753. [Crossref] [PubMed]
EASL Clinical Practice Guidelines on the management of hepatocellular carcinoma. J Hepatol 2025;82:315-74. [Crossref] [PubMed]
Zhang J, Che Y, Liu R, et al. Deep learning-driven multi-omics analysis: enhancing cancer diagnostics and therapeutics. Brief Bioinform 2025;26:bbaf440. [Crossref] [PubMed]
Gao R, Mai S, Wang S, et al. Deep Learning for the Diagnosis and Treatment of Thyroid Cancer: A Review. Endocr Pract 2025;31:1608-14. [Crossref] [PubMed]
Patel AN, Srinivasan K. Deep learning paradigms in lung cancer diagnosis: A methodological review, open challenges, and future directions. Phys Med 2025;131:104914. [Crossref] [PubMed]
Xu L, Huang Y, Fu H, et al. Comparative analysis of deep learning and radiomics models in predicting hepatocellular carcinoma differentiation via ultrasound. Front Med (Lausanne) 2025;12:1685725. [Crossref] [PubMed]
Wu K, Zhu Z, Xu D, et al. Gd-EOB-DTPA-enhanced MRI radiomics and deep learning models for predicting the pathological differentiation degree in hepatocellular carcinoma. Eur J Radiol 2026;194:112487. [Crossref] [PubMed]
Liu HF, Wang M, Lu YJ, et al. CEMRI-Based Quantification of Intratumoral Heterogeneity for Predicting Aggressive Characteristics of Hepatocellular Carcinoma Using Habitat Analysis: Comparison and Combination of Deep Learning. Acad Radiol 2024;31:2346-55. [Crossref] [PubMed]
He X, Xu Y, Zhou C, et al. Prediction of microvascular invasion and pathological differentiation of hepatocellular carcinoma based on a deep learning model. Eur J Radiol 2024;172:111348. [Crossref] [PubMed]
He J, Xu J, Chen W, et al. Development of a deep learning model for T1N0 gastric cancer diagnosis using 2.5D radiomic data in preoperative CT images. NPJ Precis Oncol 2025;9:249. [Crossref] [PubMed]
Wang W, Liang H, Zhang Z, et al. Comparing three-dimensional and two-dimensional deep-learning, radiomics, and fusion models for predicting occult lymph node metastasis in laryngeal squamous cell carcinoma based on CT imaging: a multicentre, retrospective, diagnostic study. EClinicalMedicine 2024;67:102385. [Crossref] [PubMed]
Lambin P, Woodruff HC, Mali SA, et al. Radiomics Quality Score 2.0: towards radiomics readiness levels and clinical translation for personalized medicine. Nat Rev Clin Oncol 2025;22:831-46. [Crossref] [PubMed]
Wang P, Cui J, Du H, et al. Preoperative Prediction of STAS Risk in Primary Lung Adenocarcinoma Using Machine Learning: An Interpretable Model with SHAP Analysis. Acad Radiol 2025;32:4266-77. [Crossref] [PubMed]
Okazaki S, Mine Y, Yoshimi Y, et al. RadImageNet and ImageNet as Datasets for Transfer Learning in the Assessment of Dental Radiographs: A Comparative Study. J Imaging Inform Med 2025;38:534-44. [Crossref] [PubMed]
Lu M, Zheng Y, Liu S, et al. Deep learning model for automated diagnosis of moyamoya disease based on magnetic resonance angiography. EClinicalMedicine 2024;77:102888. [Crossref] [PubMed]
Stollmayer R, Güven S, Heidt CM, et al. LI-RADS-based hepatocellular carcinoma risk mapping using contrast-enhanced MRI and self-configuring deep learning. Cancer Imaging 2025;25:36. [Crossref] [PubMed]
Zafar S, Elbanna KY, Todd AWM, et al. Can absolute arterial phase hyperenhancement improve sensitivity of detection of hepatocellular carcinoma in indeterminate nodules on CT? Eur Radiol 2024;34:2256-68. [Crossref] [PubMed]
Zuo XY, Liu HF. Biparametric magnetic resonance imaging-based radiomic and deep learning models for predicting Ki-67 risk stratification in hepatocellular carcinoma. World J Hepatol 2025;17:109530. [Crossref] [PubMed]
Li SQ, Yang CX, Wu CM, et al. Prediction of glypican-3 expression in hepatocellular carcinoma using multisequence magnetic resonance imaging-based histology nomograms. Quant Imaging Med Surg 2024;14:4436-49. [Crossref] [PubMed]
Yang C, Zhang ZM, Zhao ZP, et al. Radiomic analysis based on magnetic resonance imaging for the prediction of VEGF expression in hepatocellular carcinoma patients. Abdom Radiol (NY) 2024;49:3824-33. [Crossref] [PubMed]
Zhang YB, Chen ZQ, Bu Y, et al. Construction of a 2.5D Deep Learning Model for Predicting Early Postoperative Recurrence of Hepatocellular Carcinoma Using Multi-View and Multi-Phase CT Images. J Hepatocell Carcinoma 2024;11:2223-39. [Crossref] [PubMed]
Liu S, Li H, Xiao X, et al. A deep learning approach for predicting visceral pleural invasion in cT1 lung adenocarcinoma. J Thorac Dis 2024;16:5675-87. [Crossref] [PubMed]
Li M, Ding N, Yin S, et al. Enhancing automatic prediction of clinically significant prostate cancer with deep transfer learning 2.5-dimensional segmentation on bi-parametric magnetic resonance imaging (bp-MRI). Quant Imaging Med Surg 2024;14:4893-902. [Crossref] [PubMed]

Cite this article as: Zuo XY, Liu HF. Multi-scale deep learning models based on MRI for predicting pathological differentiation and evaluating its association with recurrence-free survival in hepatocellular carcinoma: an explainable machine learning study. J Gastrointest Oncol 2026;17(2):84. doi: 10.21037/jgo-2025-aw-928

Multi-scale deep learning models based on MRI for predicting pathological differentiation and evaluating its association with recurrence-free survival in hepatocellular carcinoma: an explainable machine learning study

Highlight box

Introduction

Methods

Study design and participant selection

MRI protocol and pathological differentiation

Tumor segmentation and multi-scale DL feature extraction

Multi-scale DL model development and assessment

Follow-up and RFS assessment

SHAP assessment

Statistical analysis

Results

Baseline characteristics

Table 1

Multi-scale DL model development and performance

Table 2

Multi-scale DL model assessment

Follow-up results and RFS prediction

SHAP analysis interpretation

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share