A novel deep learning and radiomics approach based on DCE-MRI for predicting the P53 mutation status in hepatocellular carcinoma
Highlight box
Key findings
• Report here about the key findings of the study.
What is known and what is new?
• Report here about what is known.
• Report here about what this manuscript adds.
What is the implication, and what should change now?
• Report here about the implications and actions needed.
Introduction
Liver cancer ranks sixth worldwide and is the third leading cause of cancer-related deaths (1), with hepatocellular carcinoma (HCC) accounting for most cases. Although advances in diagnosis and therapy have improved outcomes for patients with HCC, the 5-year survival rate after diagnosis is only about 10% (2). Prognosis depends mainly on tumor stage and histopathologic features. Early detection and accurate diagnosis are essential to improving survival.
The TP53 gene encodes the p53 tumor suppressor protein, which drives DNA repair, inhibits proliferation, and induces apoptosis, p53 also regulates metabolism, ferroptosis, and immune responses, collectively suppressing tumorigenesis (3). TP53 mutations are key drivers of tumor development, especially in advanced HCC (4). In oncology, TP53 has emerged as a promising target for both molecularly targeted therapy and immunotherapy (5). HCC with P53 mutations represents a biologically aggressive subtype with poor survival outcomes (6). Currently, diagnosis of P53-mutated HCC primarily relies on invasive liver biopsy, which carries potential risks and complications. Therefore, identifying P53 mutations in HCC through noninvasive methods preoperatively is vital for guiding personalized treatment strategies.
Artificial intelligence (AI) is a broad field that encompasses computational search algorithms, machine learning (ML), and deep learning (DL) models (7). DL excels at high-performance classification using large datasets. The deep convolutional neural network (DCNN) architecture, commonly used for image recognition, automatically extracts and learns deep feature data from inputs through a series of consecutive filters, eliminating manual engineering (8). This approach offers an opportunity to improve HCC clinical care by decoding tumor characteristics. DL models capture HCC biology, helping clinicians make more accurate diagnoses and optimize treatment plans (9). Current research on P53-mutated HCC focuses on image analysis. Investigators recently developed a peritumoral ultrasound radiomics model to predict P53 mutation status in HCC, which has demonstrated good diagnostic performance (10). The application of DL in diagnosing P53-mutated HCC remains limited.
This study aims to develop and validate a DL-based magnetic resonance imaging (MRI) model for preoperative prediction of P53 mutation status in HCC patients. Furthermore, by fusing clinical, radiomics, and DL models, we seek to improve disease characterization and prediction, offering enhanced guidance for clinical decision-making. We present this article in accordance with the TRIPOD reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1026/rc).
Methods
Patient
This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of Zhongshan Hospital, Fudan University (No. B2021-113R), and this study was granted a waiver for the informed consent requirement.
We retrospectively analyzed 1,010 consecutive patients with HCC who underwent surgery at Zhongshan Hospital, Fudan University, had available p53 immunohistochemistry results, and showed no evidence of preoperative extrahepatic metastasis between January 2020 and February 2021. The inclusion criteria (Figure 1) required: (I) untreated primary HCC; (II) a single tumor; (III) preoperative MRI performed within 30 days before surgery with good image quality and completeness; (IV) complete clinical data; and (V) a lesion measuring 10–100 mm in maximum diameter. These criteria yielded 320 patients who were selected and randomly assigned to the training cohort (TC) and the validation cohort (VC) in a 7:3 ratio. Figure 2 illustrates the study workflow.
Clinicopathological characteristics
We collected age, gender, alpha-fetoprotein (AFP), carcinoembryonic antigen (CEA), and carbohydrate antigen 19-9 (CA19-9). Pathological assessment, including immunohistochemistry, was performed to determine the Edmondson-Steiner grade and to evaluate microvascular invasion (MVI). Patients were classified into two groups: P53-mutated HCC and non-P53-mutated HCC. We classified tumors as P53-mutated when ≥10% of tumor cells showed positive nuclear staining (11).
MRI protocol
All patients underwent Gd-DTPA-enhanced liver MRI examinations performed on seven different scanners; the specific parameters are detailed in Table S1. Gd-DTPA was injected intravenously at 2 mL/s for a total dose of 0.1 mmol/kg body weight. For example, on a 1.5 T Magnetom Aera scanner (Siemens Healthineers, Erlangen, Germany), the protocol included axial T2-weighted fat-suppressed imaging (T2WI-FS), diffusion-weighted imaging (DWI) with b=0 and 500 s/mm2, in- and opposed-phase T1-weighted imaging (IP-OP T1WI), and axial pre-contrast three-dimensional volumetric interpolated breath-hold T1-weighted imaging with fat suppression (3D-VIBE). Postcontrast dynamic-enhanced 3D-VIBE T1-weighted imaging was performed during the arterial phase (AP, 20–30 s), portal venous phase (PVP, 60–70 s), and delayed phase (DP, 180 s). The detailed parameters for each sequence are presented in Table S2.
Imaging analyses
Images were retrospectively analyzed by two radiologists (** and **, with 6 and 15 years of experience in abdominal imaging, respectively). Any discrepancies were resolved by consensus. Assessment was performed according to the Liver Imaging Reporting and Data System (LI-RADS) version 2018 (12). The following findings were evaluated on Gd-DTPA-enhanced MRI: enhancement pattern (non-rim arterial phase hyperenhancement vs. rim enhancement), washout pattern (none, non-peripheral, or peripheral), largest tumor diameter (LTD), presence of an enhancing capsule, and delayed central enhancement.
Imaging preprocessing, region of interest (ROI) segmentation
We preprocessed images before ROI delineation to ensure image quality and consistency. First, N4 bias-field correction eliminated magnetic-field intensity inhomogeneity. Next, images were resampled to 1×1×1 mm3 voxel spacing to compensate for spatial voxel differences. Finally, grayscale normalization was performed to maintain grayscale consistency. An abdominal radiologist with 6 years of experience meticulously outlined the entire tumor area in ITK-SNAP; this delineation was then verified by a senior abdominal radiologist with 15 years of experience who manually traced the tumor margins on every transverse slice, enabling ROI generation. These ROIs were drawn on three phases: AP, PVP, and DP. To ensure consistency, MR images from 30 randomly selected HCC cases were re-examined after a one-week interval, and the ROIs were manually redrawn by the radiologists.
Radiomics feature extraction
In our study, we employed PyRadiomics to extract radiomics features. Inter- and intra-reader variability for the subregion features was 0.88 [95% confidence interval (CI): 0.53–0.99] and 0.92 (95% CI: 0.23–0.99), respectively. A total of 3,591 radiomics features were derived from the AP, PVP, and DP sequences, including first-order features, shape-based features, and texture features.
Feature selection and model construction
Radiomics features were normalized using z-score normalization to ensure consistency. To reduce redundancy, we excluded features with Spearman’s rank correlation coefficients exceeding 0.9. The least absolute shrinkage and selection operator (LASSO) identified the most predictive clinical variables and DCE-MRI-derived radiological features. Logistic regression (LR) was then used to build predictive models based on clinical data and DCE-MRI radiomics. Further details regarding radiomics feature extraction are provided in Table S3 and Figure S1.
DL procedure
We identified the slice with the largest ROI from each 2D MR sequence as the representative image. To streamline analysis and reduce background noise, we extracted the smallest bounding box that fully enclosed the ROI and expanded it by 10 pixels to include the peritumoral region, consistent with recent studies emphasizing the importance of the surrounding tissue. 3D-DL models exceed 2D-DL models in size, parameter adjustment, runtime, training data needs, and storage requirements (15). Therefore, using compact and computationally efficient 2D DL models in this study reduces hardware demands and enhances model applicability. This approach disregards volumetric information from adjacent slices, which may contain features indicative of tumour heterogeneity. To address this limitation, we extend the bounding box by 10 pixels to capture peritumoral information, as these regions contain relevant contextual features.
We standardized image-intensity distributions across the RGB channels using z-score normalization. The normalized images served as direct input to our model. During training, we performed real-time data augmentation, including random cropping and horizontal and vertical flips. At test time, only normalization was applied.
In our research, we assessed the efficacy of six CNNs—ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, and DenseNet121—to enhance the capabilities of conventional CNN-based models. Furthermore, we conducted a head-to-head comparison across these architectures to identify the one best suited to our task.
We extracted features from the penultimate layer of our model. To reduce the high-dimensional feature space, we applied principal component analysis (PCA) to obtain a 512-dimensional representation.
In addition, to assess the ability of deep-learning models to distinguish diverse samples, we used gradient-weighted class activation mapping (Grad-CAM) to visualize model decisions. Grad-CAM produces interpretable heatmaps that highlight regions informing the CNN’s decision during inference. Figure S2 shows representative Grad-CAM heatmaps. Figure S3 shows confusion matrices for the optimal models (ResNet101-FF, clinical, radiomics, and combined) in both training and validation cohorts.
MC DL model
We cropped ROIs from the three DCE-MRI sequences, stacked them into a three-channel input, and normalized the tumor regions, similar to RGB channels in natural images. This three-channel tensor was directly fed into a single CNN, enabling the network to learn joint representations across sequences through shared convolutional filters. The MC approach enables early fusion at the pixel level, capturing fine-grained spatial correlations between phases. To standardize size, we resized each ROI to 256 × 256 before feeding it into the CNN.
FF DL model
The model architecture comprised two main components: residual CNN blocks and an LR module. We fed each patient’s three MRI sequences into the network and extracted sequence-specific features with separate CNN blocks. We concatenated the extracted features to create a comprehensive MRI feature set and then fed it into the LR classifier for multimodal analysis. The FF approach enables fusion at the feature level, allowing each sequence to be processed by optimal architectures and preserving sequence-specific information before integration.
Combined model
After comparing the MC-DL and FF-DL models, we chose the one with better performance as our final DL model. We then built a combined model that integrated the clinical model, the radiomics model, and the outputs of the DL model to leverage the strengths of each component, improve predictive accuracy, and create a more comprehensive diagnostic framework.
Statistical analysis
We used the Student’s t-test, Mann-Whitney U test, Wilcoxon test, chi-square test, and Fisher’s exact test to compare baseline characteristics between the TC and VC. Univariable and multivariable analyses were performed to identify independent clinical and imaging predictors of P53 mutation. The predictive performance for P53 mutation was assessed using receiver operating characteristic (ROC) curve analysis, reporting area under the curve (AUC) with 95% CI. The DeLong test compared the diagnostic accuracy of ROC curves. Statistical significance was defined as a two-tailed P value less than 0.05.
Results
Clinical characteristics
This study included 320 participants: 259 men and 61 women. In the TC (n=224), 96 were P53− and 128 were P53+; in the VC (n=96), 41 had P53− and 55 had P53+ HCC. Table 1 shows both cohorts matched well for demographics and P53 status. Multivariable analysis identified LTD (P=0.007; OR =1.021, 95% CI: 1.008–1.034) and the presence of rim arterial phase hyperenhancement (rim-APHE; P=0.001; OR =3.844, 95% CI: 1.954–7.561) as independent predictors for P53-mutated HCC, as detailed in Table 2.
Table 1
| Characteristics | TC (n=224) | VC (n=96) | P value | |||||
|---|---|---|---|---|---|---|---|---|
| P53− HCC (n=96) | P53+ HCC (n=128) | P value | P53− HCC (n=41) | P53+ HCC (n=55) | P value | |||
| Clinical features | ||||||||
| Age, years | 59.50±12.27 | 56.13±11.70 | 0.04 | 58.90±10.58 | 60.51±9.68 | 0.36 | 0.09 | |
| Gender | 0.13 | 0.88 | 0.19 | |||||
| Male | 75 (78.12) | 111 (86.72) | 32 (78.05) | 41 (74.55) | ||||
| Female | 21 (21.88) | 17 (13.28) | 9 (21.95) | 14 (25.45) | ||||
| CA19-9, U/mL | 0.84 | 0.03 | 0.73 | |||||
| <37 | 85 (88.54) | 111 (86.72) | 33 (80.49) | 53 (96.36) | ||||
| ≥37 | 11 (11.46) | 17 (13.28) | 8 (19.51) | 2 (3.64) | ||||
| AFP, ng/mL | 0.06 | 0.60 | 0.39 | |||||
| <20 | 56 (58.33) | 56 (43.75) | 21 (51.22) | 30 (54.55) | ||||
| ≥20 and <400 | 23 (23.96) | 34 (26.56) | 14 (34.15) | 14 (25.45) | ||||
| ≥400 | 17 (17.71) | 38 (29.69) | 6 (14.63) | 11 (20.00) | ||||
| CEA, ng/mL | >0.99 | 0.80 | 0.17 | |||||
| <5 | 88 (91.67) | 118 (92.19) | 39 (95.12) | 54 (98.18) | ||||
| ≥5 | 8 (8.33) | 10 (7.81) | 2 (4.88) | 1 (1.82) | ||||
| Edmondson-Steiner grade | <0.001 | <0.001 | >0.99 | |||||
| II | 85 (88.54) | 75 (58.59) | 38 (92.68) | 30 (54.55) | ||||
| III | 11 (11.46) | 53 (41.41) | 3 (7.32) | 25 (45.45) | ||||
| MVI | 0.21 | 0.12 | 0.69 | |||||
| Negative | 65 (67.71) | 75 (58.59) | 31 (75.61) | 32 (58.18) | ||||
| Positive | 31 (32.29) | 53 (41.41) | 10 (24.39) | 23 (41.82) | ||||
| MRI features | ||||||||
| LTD, mm | 35.64±17.22 | 48.98±23.47 | <0.001 | 37.10±16.83 | 46.16±21.58 | 0.03 | 0.93 | |
| Rim-APHE | <0.001 | 0.006 | 0.71 | |||||
| Negative | 87 (90.62) | 87 (67.97) | 37 (90.24) | 35 (63.64) | ||||
| Positive | 9 (9.38) | 41 (32.03) | 4 (9.76) | 20 (36.36) | ||||
| Washout at portal venous phase | 0.82 | 0.36 | 0.66 | |||||
| Nonperipheral washout | 74 (77.08) | 102 (79.69) | 35 (85.37) | 41 (74.55) | ||||
| Peripheral washout | 4 (4.17) | 6 (4.69) | 1 (2.44) | 1 (1.82) | ||||
| No washout | 18 (18.75) | 20 (15.62) | 5 (12.20) | 13 (23.64) | ||||
| Delayed central enhancement | 0.39 | 0.95 | 0.89 | |||||
| Negative | 87 (90.62) | 121 (94.53) | 37 (90.24) | 51 (92.73) | ||||
| Positive | 9 (9.38) | 7 (5.47) | 4 (9.76) | 4 (7.27) | ||||
| Enhancing capsule | 0.16 | 0.88 | >0.99 | |||||
| Negative | 13 (13.54) | 9 (7.03) | 5 (12.20) | 5 (9.09) | ||||
| Positive | 83 (86.46) | 119 (92.97) | 36 (87.80) | 50 (90.91) | ||||
Data are shown as number of patients (percentage) or mean ± standard deviation. AFP, alpha-fetoprotein; CA19-9, carbohydrate antigen 19-9; CEA, carcinoembryonic antigen; HCC, hepatocellular carcinoma; LTD, largest tumor diameter; rim-APHE, rim arterial phase hyperenhancement; MRI, magnetic resonance imaging; MVI, microvascular invasion; TC, training cohort; VC, validation cohort.
Table 2
| Characteristics | Univariable | Multivariable | |||||
|---|---|---|---|---|---|---|---|
| OR | 95% CI | P value | OR | 95% CI | P value | ||
| Age | 1.004 | 1.000–1.008 | 0.10 | ||||
| Gender | 1.181 | 0.986–1.415 | 0.13 | ||||
| AFP | 1.232 | 1.094–1.385 | 0.004 | 0.979 | 0.718–1.344 | 0.91 | |
| CEA | 1.250 | 0.573–2.726 | 0.64 | ||||
| CA19-9 | 1.545 | 0.818–2.921 | 0.26 | ||||
| Edmondson-Steiner grade | 1.194 | 1.084–1.315 | 0.003 | 0.967 | 0.66–1.418 | 0.89 | |
| MVI | 1.710 | 1.178–2.479 | 0.02 | 0.921 | 0.536–1.582 | 0.80 | |
| LTD | 1.011 | 1.006–1.016 | <0.001 | 1.021 | 1.008–1.034 | 0.007 | |
| Rim-APHE | 4.555 | 2.487–8.348 | <0.001 | 3.844 | 1.954–7.561 | 0.001 | |
| Washout at portal venous phase | 1.147 | 0.996–1.322 | 0.11 | ||||
| Delayed central enhancement | 0.778 | 0.340–1.782 | 0.62 | ||||
| Enhancing capsule | 1.434 | 1.133–1.815 | 0.01 | 0.522 | 0.251–1.083 | 0.14 | |
AFP, alpha-fetoprotein; CA19-9, carbohydrate antigen 19-9; CEA, carcinoembryonic antigen; CI, confidence interval; LTD, largest tumor diameter; rim-APHE, rim arterial phase hyperenhancement; MVI, microvascular invasion; OR, odds ratio.
MC DL model subset selection
Within the MC DL framework, the ResNet34 model demonstrated an AUC of 0.731 (95% CI: 0.654–0.808) in the TC and 0.652 (95% CI: 0.447–0.858) in the VC, outperforming other networks. The diagnostic metrics for the predictive models across both study cohorts are detailed in Table 3. Figure S4 shows the DeLong test results and AUC curves.
Table 3
| Model name | Cohort | Accuracy | AUC (95% CI) | Sensitivity | Specificity |
|---|---|---|---|---|---|
| MC3_ResNet18 | TC | 0.625 | 0.613 (0.523–0.702) | 0.667 | 0.567 |
| VC | 0.594 | 0.543 (0.330–0.756) | 0.500 | 0.687 | |
| MC3_ResNet34 | TC | 0.669 | 0.731 (0.654–0.808) | 0.656 | 0.687 |
| VC | 0.688 | 0.652 (0.447–0.858) | 0.437 | 0.937 | |
| MC3_ResNet50 | TC | 0.681 | 0.664 (0.576–0.751) | 0.763 | 0.567 |
| VC | 0.688 | 0.637 (0.427–0.846) | 0.625 | 0.750 | |
| MC3_ResNet101 | TC | 0.656 | 0.665 (0.579–0.750) | 0.763 | 0.507 |
| VC | 0.656 | 0.637 (0.436–0.838) | 0.375 | 0.937 | |
| MC3_ResNet152 | TC | 0.506 | 0.550 (0.461–0.640) | 0.226 | 0.896 |
| VC | 0.594 | 0.602 (0.398–0.805) | 0.250 | 0.937 | |
| MC3_DenseNet121 | TC | 0.650 | 0.670 (0.585–0.755) | 0.742 | 0.522 |
| VC | 0.688 | 0.641 (0.435–0.846) | 0.375 | 1.000 |
AUC, area under the curve; CI, confidence interval; DL, deep learning; MC, multi-channel; TC, training cohort; VC, validation cohort.
FF DL model subset selection
From each of the three enhanced magnetic resonance sequences, we extracted eight compressed DL features, which were then combined to form a set of 24 DL features for DCE-MRI. After LASSO feature selection, the CNN models ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, and DenseNet121 were filtered to retain 18, 14, 11, 12, 3, and 6 features, respectively; refer to Figure S5 for details. We identified ResNet101 as the model with an AUC exceeding 0.7 in the TC and 0.6 in the VC. ResNet101 demonstrated an AUC of 0.779 (95% CI: 0.719–0.839) for the TC and 0.663 (95% CI: 0.552–0.774) for the VC, which made it the optimal diagnostic indicator in the FF DL model study. The diagnostic metrics for the predictive models based on the six CNNs across both study cohorts are presented in Table 4. The DeLong test outcomes and AUC curves are shown in Figure S6.
Table 4
| Model name | Cohort | Accuracy | AUC (95% CI) | Sensitivity | Specificity |
|---|---|---|---|---|---|
| DL_ResNet18 | TC | 0.732 | 0.793 (0.735–0.851) | 0.719 | 0.750 |
| VC | 0.573 | 0.552 (0.434–0.669) | 0.745 | 0.341 | |
| DL_ResNet34 | TC | 0.737 | 0.798 (0.740–0.855) | 0.789 | 0.667 |
| VC | 0.604 | 0.570 (0.451–0.689) | 0.800 | 0.341 | |
| DL_ResNet50 | TC | 0.732 | 0.774 (0.713–0.835) | 0.664 | 0.823 |
| VC | 0.615 | 0.582 (0.466–0.698) | 0.855 | 0.293 | |
| DL_ResNet101 | TC | 0.696 | 0.779 (0.719–0.839) | 0.625 | 0.792 |
| VC | 0.667 | 0.663 (0.552–0.774) | 0.691 | 0.634 | |
| DL_ResNet152 | TC | 0.643 | 0.670 (0.599–0.741) | 0.719 | 0.542 |
| VC | 0.635 | 0.618 (0.502–0.733) | 0.727 | 0.512 | |
| DL_DenseNet121 | TC | 0.661 | 0.707 (0.639–0.775) | 0.539 | 0.823 |
| VC | 0.552 | 0.559 (0.443–0.675) | 0.364 | 0.805 |
AUC, area under the curve; CI, confidence interval; DL, deep learning; FF, feature fusion; TC, training cohort; VC, validation cohort.
Combined model
ResNet101 from the FF DL model outperformed ResNet34 from the MC DL model, so we retained ResNet101 as the final DL model. The clinical model achieved an AUC of 0.716 (95% CI: 0.648–0.783) in the TC and 0.688 (95% CI: 0.581–0.795) in the VC. The radiomics model’s AUC in the TC was 0.703 (95% CI: 0.635–0.772), and in the VC it was 0.519 (95% CI: 0.399–0.639). By integrating the clinical model, radiomics model, and DL model, we observed a significant improvement in AUC, with 0.838 (95% CI: 0.786–0.891) in the TC and 0.702 (95% CI: 0.598–0.807) in the VC. This integration also improved accuracy and sensitivity in the TC, as present in Table 5. Figure S7 shows the DeLong test results and AUC curves. To assess the diagnostic accuracy of the combined model, we constructed a nomogram, as depicted in Figure 3A. The nomogram calibration curves showed good agreement between predicted and actual P53 status in both cohorts, as shown in Figure 3B,3C. Decision curve analysis (DCA) also showed the combined model yielded superior net benefits over individual models across both cohorts, as detailed in Figure 3D,3E.
Table 5
| Model name | Cohort | Accuracy | AUC (95% CI) | Sensitivity | Specificity |
|---|---|---|---|---|---|
| Clinical model | TC | 0.688 | 0.716 (0.648–0.783) | 0.727 | 0.635 |
| VC | 0.677 | 0.688 (0.581–0.795) | 0.564 | 0.829 | |
| Radiomics model | TC | 0.661 | 0.703 (0.635–0.772) | 0.703 | 0.604 |
| VC | 0.594 | 0.519 (0.399–0.639) | 0.745 | 0.390 | |
| DL model | TC | 0.696 | 0.779 (0.719–0.839) | 0.625 | 0.792 |
| VC | 0.667 | 0.663 (0.552–0.774) | 0.691 | 0.634 | |
| Combined model | TC | 0.790 | 0.838 (0.786–0.891) | 0.836 | 0.729 |
| VC | 0.667 | 0.702 (0.598–0.807) | 0.655 | 0.683 |
AUC, area under the curve; CI, confidence interval; DL, deep learning; TC, training cohort; VC, validation cohort.
Discussion
In this retrospective analysis, we derived key DL and radiomic features correlated with P53-mutated HCC from DCE-MRI. Among MC DL models, ResNet34 achieved the highest diagnostic accuracy. ResNet101 outperformed other FF DL models and achieved greater diagnostic efficacy than ResNet34 from the MC DL model. Multivariable analysis identified LTD and rim-APHE as independent risk factors for the clinical model. By integrating the clinical, radiomic, and DL models, we developed and validated a combined model that achieved superior AUCs in both cohorts: 0.838 in the TC and 0.702 in the VC. A nomogram visualized this combined model and DCA confirmed its clinical utility.
DL has been extensively applied to HCC, including U-net-based liver segmentation (13), grading of tumor differentiation (14), prediction of MVI (15), Vessels Encapsulating Tumor Clusters (VETC) (16), and evaluation of transarterial chemoembolization (TACE) outcomes (17). AI models can offer early predictions, potentially prior to the first response assessment, to determine whether patients will benefit from a given treatment or whether alternative therapies should be considered. Among the most valuable biomarkers for HCC prognosis are P53, CK19, and Ki-67, which typically require invasive methods for clinical confirmation through preoperative biopsy or postoperative pathology, as well as localized sampling. A study developed a DL model for CK19 prediction, achieving an AUC of 0.82 in a cohort of 141 patients (18).
The radiomics model performed poorly in the validation cohort (AUC 0.519, 95% CI: 0.399–0.639), approaching random prediction. This finding suggests limited generalizability of the radiomics features, likely attributable to several methodological factors. Our study utilized seven MRI scanners (1.5 and 3.0 T) from different manufacturers with varying imaging parameters, introducing scanner-related variability that affects N4 bias-field correction, resampling to 1×1×1 mm3, and grayscale normalization, these may have been insufficient to fully harmonize features across heterogeneous scanner protocols. These factors explain the marked performance decline observed in the validation cohort. Single-centre design further limits generalizability; multicentre prospective validation is essential. A recent systematic review of MRI radiomics standardization strategies [2019–2022] reported that radiomics generalizability remains unresolved when multiple vendors and scanner models are involved, although preprocessing strategies can enhance analytical robustness (19).
Jia et al. recently developed a multi-sequence MRI DL model for predicting P53-mutated HCC, achieving high predictive performance (test AUC =0.914–0.919) (20). Their study employed a single CNN architecture (EfficientNetV2) with random forest fusion. We systematically compared six CNN architectures (ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, and DenseNet121) across two distinct fusion strategies (multi-channel fusion and feature fusion) to identify the optimal model configuration. This comprehensive comparison provides empirical guidance for CNN selection in P53-mutated HCC prediction. Our approach mirrored a strategy previously used to predict occult cervical lymph-node metastasis and prognosis in early-stage oral and oropharyngeal squamous cell carcinoma, where six distinct CNNs were used to build 17 predictive models (21). The diversity of CNN architectures and depths enables extraction of a broad range of features from the data. The variable performance of different CNN models across tasks and datasets allows for the selection of the most suitable model or combination, enhancing the model’s flexibility and adaptability. In our study, ResNet34 in the MC DL model achieved the highest AUC of 0.731 in the TC and 0.652 in the VC, while ResNet101 in the FF DL model achieved the best performance with an AUC of 0.779 in the TC and 0.663 in the VC.
A study predicting MVI and clinical outcomes in HCC patients found that fusing three MR sequences (T1, T1D, and T1V) achieved 92.11% accuracy, surpassing any single sequence (22). In our investigation, rim-APHE emerged as a risk factor for P53-mutated HCC in the clinical model, prompting us to concentrate on enhancement sequences. We selected AP, PVP, and DP for analysis. Another study using deep-learning models and DCE-MRI sequences to predict preoperative VETC status and prognosis in HCC patients reported higher diagnostic performance with three fused sequences (AUC 0.897) (23). Integrating multiple sequences allows more comprehensive feature extraction and improves model performance and generalizability.
Multi-channel DL concurrently processes multiple input channels within a single model to produce a normalized tumor mask and is well suited to multimodal data. In a study assessing the diagnostic efficacy of CNNs on intravoxel incoherent motion (IVIM) diffusion-weighted MRI for predicting MVI in HCC, nine b-value images were concatenated across the channel dimension, and the CNN extracted deep features directly from the b-value volume and achieved superior MVI prediction, with AUCs of 0.810 (range, 0.760–0.829) (24). Wang et al. also fused features from six MRI sequences to predict MVI in HCC; however, they used a multilayer perceptron (MLP) for feature fusion (25). In another study employing multitask DL based on MRI images to predict MVI and recurrence-free survival in HCC, features from AP, PVP, DWI, and T2WI sequences were fused to form the MRI feature set and achieved high MVI prediction accuracy (26). In this study, the multi-channel fusion yielded AUCs of 0.550–0.731 in TC and 0.543–0.652 in VC. Feature fusion achieved AUCs of 0.670–0.798 in TC and 0.552–0.663 in VC, outperforming multi-channel fusion. Feature fusion integrates information across MRI phases, helping models focus on relevant high-level features, reducing noise sensitivity and overfitting risk. Its simplicity and flexibility also facilitate adaptation to diverse tasks and data types.
Radiomics, DL, and fusion models are increasingly prevalent in medical research. A recent systematic review revealed that fusion models performed well in 63% of studies reviewed, poorly in 25%, and fairly well in 13% (27). In a comparative study examining 3D and 2D DL, radiomics, and fusion models based on CT imaging for predicting occult lymph node metastasis in laryngeal squamous cell carcinoma, late fusion methods were superior to early fusion, suggesting that the performance of fusion models may depend on the fusion strategy employed (28). LR integrated clinical, radiomics and DL models into a combined model. This approach offers operational simplicity, fast processing, low memory use and easy clinical implementation. This choice introduces several methodological risks. Clinical features such as tumor diameter may correlate with DL imaging features, potentially overweighting certain signals. The clinical model uses 2–3 low-dimensional features against hundreds to thousands from radiomics and DL models, creating an imbalance that may bias fusion toward imaging predictions. The radiomics model performed poorly in validation (AUC 0.519), likely introducing noise rather than complementary information. This combined model significantly improved diagnostic performance over individual models, achieving an AUC of 0.838 (95% CI: 0.786–0.891) in the TC and 0.702 (95% CI: 0.598–0.807) in the VC, suggesting that clinical and DL components provided a robust signal that offset radiomics limitations.
There are several limitations in this study. Firstly, this study used internal validation only, random split from a single-centre dataset, rather than external validation with independent data from different centres or time periods. The modest AUC of 0.702 (95% CI: 0.598–0.807) in the validation cohort, compared with 0.838 in the training cohort, suggests moderate performance and potential overfitting, indicating the model is not ready for routine clinical use. Future studies should perform temporal and external validation with independent centres, different MRI scanners, and protocols to confirm robustness and generalisability. Secondly, we focused on conventional CNN architectures (ResNet and DenseNet) and did not evaluate newer architectures such as Vision Transformers, Swin Transformers, or attention-based models. ResNet101 performed best in our feature fusion framework, but newer architectures might better capture long-range dependencies or multi-scale features. Model complexity also requires consideration: deeper architectures such as ResNet101 and ResNet152 may overfit on limited medical imaging datasets. ResNet101 outperformed ResNet152 in validation, suggesting moderate complexity suits this dataset size better than very deep architectures. Future studies should evaluate newer architectures and investigate optimal complexity for HCC datasets of varying sizes. Thirdly, we assumed that the immunohistochemical results of P53 indicate a TP53 mutation (based on the percentage of nuclear staining), which should be confirmed through genetic testing to enhance the accuracy of the results. Fourthly, in addition to analyzing P53 mutation information, establishing long-term follow-up is crucial, and utilizing analytical DL to accurately predict the survival and prognosis of HCC patients should be a focal point of CNN-based DL research. Fifthly, our study primarily focused on MRI enhancement sequences, and further research is needed for other sequences. Lastly, the use of seven 1.5 and 3.0 T MRI scanners with varying parameters from different manufacturers may introduce variability to our results; however, we conducted data normalization to minimize the variability across MRI scanners.
Conclusions
The CNN DL model based on DCE-MRI demonstrates superior diagnostic accuracy in preoperatively predicting P53-mutated HCC. Furthermore, integrating this DL model with clinical and radiomics models improves predictive performance beyond that of any individual model. This integrated approach demonstrates feasibility and potential for preoperative prediction of P53-mutated HCC. However, multicentre external validation using diverse scanners, imaging protocols, and patient populations is necessary before clinical implementation. If validated externally, this model could eventually assist clinicians in treatment decision-making.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1026/rc
Data Sharing Statement: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1026/dss
Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1026/prf
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1026/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of Zhongshan Hospital, Fudan University (No. B2021-113R), and this study was granted a waiver for the informed consent requirement.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
- Konyn P, Ahmed A, Kim D. Current epidemiology in hepatocellular carcinoma. Expert Rev Gastroenterol Hepatol 2021;15:1295-307. [Crossref] [PubMed]
- Liu Y, Su Z, Tavana O, et al. Understanding the complexity of p53 in a new era of tumor suppression. Cancer Cell 2024;42:946-67. [Crossref] [PubMed]
- Nault JC, Martin Y, Caruso S, et al. Clinical Impact of Genomic Diversity From Early to Advanced Hepatocellular Carcinoma. Hepatology 2020;71:164-82. [Crossref] [PubMed]
- Rebouissou S, Nault JC. Advances in molecular classification and precision oncology in hepatocellular carcinoma. J Hepatol 2020;72:215-29. [Crossref] [PubMed]
- Kitao A, Matsui O, Zhang Y, et al. Dynamic CT and Gadoxetic Acid-enhanced MRI Characteristics of P53-mutated Hepatocellular Carcinoma. Radiology 2023;306:e220531. [Crossref] [PubMed]
- Calderaro J, Seraphin TP, Luedde T, et al. Artificial intelligence for the prevention and clinical management of hepatocellular carcinoma. J Hepatol 2022;76:1348-61. [Crossref] [PubMed]
- Le Berre C, Sandborn WJ, Aridhi S, et al. Application of Artificial Intelligence to Gastroenterology and Hepatology. Gastroenterology 2020;158:76-94.e2. [Crossref] [PubMed]
- Xia T, Zhao B, Li B, et al. MRI-Based Radiomics and Deep Learning in Biological Characteristics and Prognosis of Hepatocellular Carcinoma: Opportunities and Challenges. J Magn Reson Imaging 2024;59:767-83. [Crossref] [PubMed]
- Qian H, Huang Y, Xu L, et al. Role of peritumoral tissue analysis in predicting characteristics of hepatocellular carcinoma using ultrasound-based radiomics. Sci Rep 2024;14:11538. [Crossref] [PubMed]
- Tseng PL, Tai MH, Huang CC, et al. Overexpression of VEGF is associated with positive p53 immunostaining in hepatocellular carcinoma (HCC) and adverse outcome of HCC patients. J Surg Oncol 2008;98:349-57. [Crossref] [PubMed]
- Marrero JA, Kulik LM, Sirlin CB, et al. Diagnosis, Staging, and Management of Hepatocellular Carcinoma: 2018 Practice Guidance by the American Association for the Study of Liver Diseases. Hepatology 2018;68:723-50. [Crossref] [PubMed]
- Raman AG, Jones C, Weiss CR. Machine Learning for Hepatocellular Carcinoma Segmentation at MRI: Radiology In Training. Radiology 2022;304:509-15. [Crossref] [PubMed]
- Zhou Q, Zhou Z, Chen C, et al. Grading of hepatocellular carcinoma using 3D SE-DenseNet in dynamic enhanced MR images. Comput Biol Med 2019;107:47-57. [Crossref] [PubMed]
- Wang T, Li Z, Yu H, et al. Prediction of microvascular invasion in hepatocellular carcinoma based on preoperative Gd-EOB-DTPA-enhanced MRI: Comparison of predictive performance among 2D, 2D-expansion and 3D deep learning models. Front Oncol 2023;13:987781. [Crossref] [PubMed]
- Dong X, Yang J, Zhang B, et al. Deep Learning Radiomics Model of Dynamic Contrast-Enhanced MRI for Evaluating Vessels Encapsulating Tumor Clusters and Prognosis in Hepatocellular Carcinoma. J Magn Reson Imaging 2024;59:108-19. [Crossref] [PubMed]
- Chen M, Kong C, Qiao E, et al. Multi-algorithms analysis for pre-treatment prediction of response to transarterial chemoembolization in hepatocellular carcinoma on multiphase MRI. Insights Imaging 2023;14:38. [Crossref] [PubMed]
- Chen Y, Chen J, Zhang Y, et al. Preoperative Prediction of Cytokeratin 19 Expression for Hepatocellular Carcinoma with Deep Learning Radiomics Based on Gadoxetic Acid-Enhanced Magnetic Resonance Imaging. J Hepatocell Carcinoma 2021;8:795-808. [Crossref] [PubMed]
- Trojani V, Bassi MC, Verzellesi L, et al. Impact of Preprocessing Parameters in Medical Imaging-Based Radiomic Studies: A Systematic Review. Cancers (Basel) 2024;16:2668. [Crossref] [PubMed]
- Jia L, Yang Q, Jiang H, et al. Deep learning-based MRI model for predicting P53-mutated hepatocellular carcinoma. BMC Med Imaging 2025;25:506. [Crossref] [PubMed]
- Lan T, Kuang S, Liang P, et al. MRI-based deep learning and radiomics for prediction of occult cervical lymph node metastasis and prognosis in early-stage oral and oropharyngeal squamous cell carcinoma: a diagnostic study. Int J Surg 2024;110:4648-59. [Crossref] [PubMed]
- Sun BY, Gu PY, Guan RY, et al. Deep-learning-based analysis of preoperative MRI predicts microvascular invasion and outcome in hepatocellular carcinoma. World J Surg Oncol 2022;20:189. [Crossref] [PubMed]
- Yang J, Dong X, Wang F, et al. A deep learning model based on MRI for prediction of vessels encapsulating tumour clusters and prognosis in hepatocellular carcinoma. Abdom Radiol (NY) 2024;49:1074-83. [Crossref] [PubMed]
- Liu B, Zeng Q, Huang J, et al. IVIM using convolutional neural networks predicts microvascular invasion in HCC. Eur Radiol 2022;32:7185-95. [Crossref] [PubMed]
- Wang F, Chen Q, Chen Y, et al. A novel multimodal deep learning model for preoperative prediction of microvascular invasion and outcome in hepatocellular carcinoma. Eur J Surg Oncol 2023;49:156-64. [Crossref] [PubMed]
- Wang F, Zhan G, Chen QQ, et al. Multitask deep learning for prediction of microvascular invasion and recurrence-free survival in hepatocellular carcinoma based on MRI images. Liver Int 2024;44:1351-62. [Crossref] [PubMed]
- Demircioğlu A. Are deep models in radiomics performing better than generic models? A systematic review. Eur Radiol Exp 2023;7:11.
- Wang W, Liang H, Zhang Z, et al. Comparing three-dimensional and two-dimensional deep-learning, radiomics, and fusion models for predicting occult lymph node metastasis in laryngeal squamous cell carcinoma based on CT imaging: a multicentre, retrospective, diagnostic study. EClinicalMedicine 2024;67:102385. [Crossref] [PubMed]

