A novel deep learning and radiomics approach based on DCE-MRI for predicting the P53 mutation status in hepatocellular carcinoma
Original Article

A novel deep learning and radiomics approach based on DCE-MRI for predicting the P53 mutation status in hepatocellular carcinoma

Jingfei Weng1#, Yuqing He1#, Qihu Zeng2#, Dinghua Yao1, Hanzhong Zhou3, Bo Li4, Chun Yang5, Bin Yang1

1Department of Radiology, The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Luzhou, China; 2Department of Cardiovascular Medicine, The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Luzhou, China; 3Department of Surgery, Gongxian Huajian Hospital, Yibin, China; 4Department of Hepatobiliary Medicine, The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Luzhou, China; 5Department of Radiology, Zhongshan Hospital, Fudan University, Shanghai, China

Contributions: (I) Conception and design: C Yang, B Yang; (II) Administrative support: C Yang, J Weng; (III) Provision of study materials or patients: Q Zeng, D Yao; (IV) Collection and assembly of data: Y He, H Zhou; (V) Data analysis and interpretation: J Weng, B Li; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Bin Yang, MM. Department of Radiology, The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, No. 182, Chunhui Road, Longmatan District, Luzhou 646000, China. Email: xkdfzyyangb@swmu.edu.cn; Chun Yang, PhD. Department of Radiology, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Xuhui District, Shanghai 200032, China. Email: dryangchun@hotmail.com; Bo Li, MD. Department of Hepatobiliary Medicine, The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, No. 182, Chunhui Road, Longmatan District, Luzhou 646000, China. Email: libo2004051192@163.com.

Background: P53-mutated hepatocellular carcinoma (HCC) is an aggressive subtype with poor prognosis, currently diagnosed only by invasive biopsy. Noninvasive preoperative prediction could guide personalized treatment. This study aimed to develop and evaluate a preoperative method for predicting P53-mutated HCC using deep learning (DL) and radiomics model based on magnetic resonance imaging (MRI).

Methods: In this retrospective study, we included 320 consecutive patients who underwent surgical resection for HCC between January 2020 and February 2021, had postoperative P53 immunohistochemistry results, and showed no evidence of extrahepatic metastases preoperatively. Patients were randomly assigned to a Training cohort (n=224) and a Validation cohort (n=96). Clinical risk factors were identified through stepwise regression analysis, and a clinical prediction model was built. We developed a radiomics model and multiple convolutional neural network (CNN) models using dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) of HCC. Additionally, clinical models were constructed based on clinical and imaging features. Two fusion strategies were employed to build DL fusion models for the arterial phase (AP), portal venous phase (PVP), and delayed phase (DP): the multi-channel (MC) DL model and the feature fusion (FF) DL model. Ultimately, we integrated the optimal DL model with the radiomics and clinical models into a combined model, developed nomograms to predict P53-mutated HCC, and evaluated its performance using receiver operating characteristic (ROC) analysis, calibration, and decision curve analysis.

Results: Of 320 patients, 259 were men and 61 were women (mean age, 58±11 years). There were 183 cases of P53-mutated HCC and 137 cases of non-P53-mutated HCC. In the MC DL model, ResNet34 achieved area under the curve (AUC) values of 0.731 [95% confidence interval (CI): 0.654–0.808] in the training cohort (TC) and 0.652 (95% CI: 0.447–0.858) in the validation cohort (VC), outperforming other CNNs. In the FF DL model, ResNet101 demonstrated an AUC of 0.779 (95% CI: 0.719–0.839) for TC and 0.663 (95% CI: 0.552–0.774) for VC, showing superior performance compared to other CNNs. Integration of the clinical, radiomics, and DL models improved the AUC to 0.838 (95% CI: 0.786–0.891) in the TC and 0.702 (95% CI: 0.598–0.807) in the VC.

Conclusions: The combination of clinical, radiomics, and DL models provides an effective tool for preoperative prediction of P53 mutation status in patients with HCC.

Keywords: Hepatocellular carcinoma (HCC); P53 mutation; magnetic resonance imaging (MRI); radiomics; deep learning (DL)


Submitted Dec 16, 2025. Accepted for publication Mar 02, 2026. Published online Mar 27, 2026.

doi: 10.21037/jgo-2025-1-1026


Highlight box

Key findings

• Report here about the key findings of the study.

What is known and what is new?

• Report here about what is known.

• Report here about what this manuscript adds.

What is the implication, and what should change now?

• Report here about the implications and actions needed.


Introduction

Liver cancer ranks sixth worldwide and is the third leading cause of cancer-related deaths (1), with hepatocellular carcinoma (HCC) accounting for most cases. Although advances in diagnosis and therapy have improved outcomes for patients with HCC, the 5-year survival rate after diagnosis is only about 10% (2). Prognosis depends mainly on tumor stage and histopathologic features. Early detection and accurate diagnosis are essential to improving survival.

The TP53 gene encodes the p53 tumor suppressor protein, which drives DNA repair, inhibits proliferation, and induces apoptosis, p53 also regulates metabolism, ferroptosis, and immune responses, collectively suppressing tumorigenesis (3). TP53 mutations are key drivers of tumor development, especially in advanced HCC (4). In oncology, TP53 has emerged as a promising target for both molecularly targeted therapy and immunotherapy (5). HCC with P53 mutations represents a biologically aggressive subtype with poor survival outcomes (6). Currently, diagnosis of P53-mutated HCC primarily relies on invasive liver biopsy, which carries potential risks and complications. Therefore, identifying P53 mutations in HCC through noninvasive methods preoperatively is vital for guiding personalized treatment strategies.

Artificial intelligence (AI) is a broad field that encompasses computational search algorithms, machine learning (ML), and deep learning (DL) models (7). DL excels at high-performance classification using large datasets. The deep convolutional neural network (DCNN) architecture, commonly used for image recognition, automatically extracts and learns deep feature data from inputs through a series of consecutive filters, eliminating manual engineering (8). This approach offers an opportunity to improve HCC clinical care by decoding tumor characteristics. DL models capture HCC biology, helping clinicians make more accurate diagnoses and optimize treatment plans (9). Current research on P53-mutated HCC focuses on image analysis. Investigators recently developed a peritumoral ultrasound radiomics model to predict P53 mutation status in HCC, which has demonstrated good diagnostic performance (10). The application of DL in diagnosing P53-mutated HCC remains limited.

This study aims to develop and validate a DL-based magnetic resonance imaging (MRI) model for preoperative prediction of P53 mutation status in HCC patients. Furthermore, by fusing clinical, radiomics, and DL models, we seek to improve disease characterization and prediction, offering enhanced guidance for clinical decision-making. We present this article in accordance with the TRIPOD reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1026/rc).


Methods

Patient

This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of Zhongshan Hospital, Fudan University (No. B2021-113R), and this study was granted a waiver for the informed consent requirement.

We retrospectively analyzed 1,010 consecutive patients with HCC who underwent surgery at Zhongshan Hospital, Fudan University, had available p53 immunohistochemistry results, and showed no evidence of preoperative extrahepatic metastasis between January 2020 and February 2021. The inclusion criteria (Figure 1) required: (I) untreated primary HCC; (II) a single tumor; (III) preoperative MRI performed within 30 days before surgery with good image quality and completeness; (IV) complete clinical data; and (V) a lesion measuring 10–100 mm in maximum diameter. These criteria yielded 320 patients who were selected and randomly assigned to the training cohort (TC) and the validation cohort (VC) in a 7:3 ratio. Figure 2 illustrates the study workflow.

Figure 1 Study flowchart of the enrolled patients. HCC, hepatocellular carcinoma; MRI, magnetic resonance imaging.
Figure 2 Workflow diagram for predictive model development. Experienced radiologists carried out tumor segmentation and the delineation of ROIs. Radiomics models were constructed utilizing PyRadiomics. In the MC DL model, ROIs were extracted from the three sequences of DCE-MRI, and these images were concatenated into a three-channel input for the DL model. We assessed the performance of six CNNs: ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, and DenseNet121. For the FF DL model, MRI data from three sequences were fed into these six CNNs to extract features, which were then integrated and input into a LR classifier for multimodal analysis. The features from the three fundamental models were amalgamated to form the combined model. AP, arterial phase; CNNs, convolutional neural networks; DCE-MRI, dynamic contrast-enhanced magnetic resonance imaging; DL, deep learning; DP, delayed phase; FF, feature fusion; LR, logistic regression; MC, multi-channel; PVP, portal venous phase; ROI, region of interest.

Clinicopathological characteristics

We collected age, gender, alpha-fetoprotein (AFP), carcinoembryonic antigen (CEA), and carbohydrate antigen 19-9 (CA19-9). Pathological assessment, including immunohistochemistry, was performed to determine the Edmondson-Steiner grade and to evaluate microvascular invasion (MVI). Patients were classified into two groups: P53-mutated HCC and non-P53-mutated HCC. We classified tumors as P53-mutated when ≥10% of tumor cells showed positive nuclear staining (11).

MRI protocol

All patients underwent Gd-DTPA-enhanced liver MRI examinations performed on seven different scanners; the specific parameters are detailed in Table S1. Gd-DTPA was injected intravenously at 2 mL/s for a total dose of 0.1 mmol/kg body weight. For example, on a 1.5 T Magnetom Aera scanner (Siemens Healthineers, Erlangen, Germany), the protocol included axial T2-weighted fat-suppressed imaging (T2WI-FS), diffusion-weighted imaging (DWI) with b=0 and 500 s/mm2, in- and opposed-phase T1-weighted imaging (IP-OP T1WI), and axial pre-contrast three-dimensional volumetric interpolated breath-hold T1-weighted imaging with fat suppression (3D-VIBE). Postcontrast dynamic-enhanced 3D-VIBE T1-weighted imaging was performed during the arterial phase (AP, 20–30 s), portal venous phase (PVP, 60–70 s), and delayed phase (DP, 180 s). The detailed parameters for each sequence are presented in Table S2.

Imaging analyses

Images were retrospectively analyzed by two radiologists (** and **, with 6 and 15 years of experience in abdominal imaging, respectively). Any discrepancies were resolved by consensus. Assessment was performed according to the Liver Imaging Reporting and Data System (LI-RADS) version 2018 (12). The following findings were evaluated on Gd-DTPA-enhanced MRI: enhancement pattern (non-rim arterial phase hyperenhancement vs. rim enhancement), washout pattern (none, non-peripheral, or peripheral), largest tumor diameter (LTD), presence of an enhancing capsule, and delayed central enhancement.

Imaging preprocessing, region of interest (ROI) segmentation

We preprocessed images before ROI delineation to ensure image quality and consistency. First, N4 bias-field correction eliminated magnetic-field intensity inhomogeneity. Next, images were resampled to 1×1×1 mm3 voxel spacing to compensate for spatial voxel differences. Finally, grayscale normalization was performed to maintain grayscale consistency. An abdominal radiologist with 6 years of experience meticulously outlined the entire tumor area in ITK-SNAP; this delineation was then verified by a senior abdominal radiologist with 15 years of experience who manually traced the tumor margins on every transverse slice, enabling ROI generation. These ROIs were drawn on three phases: AP, PVP, and DP. To ensure consistency, MR images from 30 randomly selected HCC cases were re-examined after a one-week interval, and the ROIs were manually redrawn by the radiologists.

Radiomics feature extraction

In our study, we employed PyRadiomics to extract radiomics features. Inter- and intra-reader variability for the subregion features was 0.88 [95% confidence interval (CI): 0.53–0.99] and 0.92 (95% CI: 0.23–0.99), respectively. A total of 3,591 radiomics features were derived from the AP, PVP, and DP sequences, including first-order features, shape-based features, and texture features.

Feature selection and model construction

Radiomics features were normalized using z-score normalization to ensure consistency. To reduce redundancy, we excluded features with Spearman’s rank correlation coefficients exceeding 0.9. The least absolute shrinkage and selection operator (LASSO) identified the most predictive clinical variables and DCE-MRI-derived radiological features. Logistic regression (LR) was then used to build predictive models based on clinical data and DCE-MRI radiomics. Further details regarding radiomics feature extraction are provided in Table S3 and Figure S1.

DL procedure

We identified the slice with the largest ROI from each 2D MR sequence as the representative image. To streamline analysis and reduce background noise, we extracted the smallest bounding box that fully enclosed the ROI and expanded it by 10 pixels to include the peritumoral region, consistent with recent studies emphasizing the importance of the surrounding tissue. 3D-DL models exceed 2D-DL models in size, parameter adjustment, runtime, training data needs, and storage requirements (15). Therefore, using compact and computationally efficient 2D DL models in this study reduces hardware demands and enhances model applicability. This approach disregards volumetric information from adjacent slices, which may contain features indicative of tumour heterogeneity. To address this limitation, we extend the bounding box by 10 pixels to capture peritumoral information, as these regions contain relevant contextual features.

We standardized image-intensity distributions across the RGB channels using z-score normalization. The normalized images served as direct input to our model. During training, we performed real-time data augmentation, including random cropping and horizontal and vertical flips. At test time, only normalization was applied.

In our research, we assessed the efficacy of six CNNs—ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, and DenseNet121—to enhance the capabilities of conventional CNN-based models. Furthermore, we conducted a head-to-head comparison across these architectures to identify the one best suited to our task.

We extracted features from the penultimate layer of our model. To reduce the high-dimensional feature space, we applied principal component analysis (PCA) to obtain a 512-dimensional representation.

In addition, to assess the ability of deep-learning models to distinguish diverse samples, we used gradient-weighted class activation mapping (Grad-CAM) to visualize model decisions. Grad-CAM produces interpretable heatmaps that highlight regions informing the CNN’s decision during inference. Figure S2 shows representative Grad-CAM heatmaps. Figure S3 shows confusion matrices for the optimal models (ResNet101-FF, clinical, radiomics, and combined) in both training and validation cohorts.

MC DL model

We cropped ROIs from the three DCE-MRI sequences, stacked them into a three-channel input, and normalized the tumor regions, similar to RGB channels in natural images. This three-channel tensor was directly fed into a single CNN, enabling the network to learn joint representations across sequences through shared convolutional filters. The MC approach enables early fusion at the pixel level, capturing fine-grained spatial correlations between phases. To standardize size, we resized each ROI to 256 × 256 before feeding it into the CNN.

FF DL model

The model architecture comprised two main components: residual CNN blocks and an LR module. We fed each patient’s three MRI sequences into the network and extracted sequence-specific features with separate CNN blocks. We concatenated the extracted features to create a comprehensive MRI feature set and then fed it into the LR classifier for multimodal analysis. The FF approach enables fusion at the feature level, allowing each sequence to be processed by optimal architectures and preserving sequence-specific information before integration.

Combined model

After comparing the MC-DL and FF-DL models, we chose the one with better performance as our final DL model. We then built a combined model that integrated the clinical model, the radiomics model, and the outputs of the DL model to leverage the strengths of each component, improve predictive accuracy, and create a more comprehensive diagnostic framework.

Statistical analysis

We used the Student’s t-test, Mann-Whitney U test, Wilcoxon test, chi-square test, and Fisher’s exact test to compare baseline characteristics between the TC and VC. Univariable and multivariable analyses were performed to identify independent clinical and imaging predictors of P53 mutation. The predictive performance for P53 mutation was assessed using receiver operating characteristic (ROC) curve analysis, reporting area under the curve (AUC) with 95% CI. The DeLong test compared the diagnostic accuracy of ROC curves. Statistical significance was defined as a two-tailed P value less than 0.05.


Results

Clinical characteristics

This study included 320 participants: 259 men and 61 women. In the TC (n=224), 96 were P53− and 128 were P53+; in the VC (n=96), 41 had P53− and 55 had P53+ HCC. Table 1 shows both cohorts matched well for demographics and P53 status. Multivariable analysis identified LTD (P=0.007; OR =1.021, 95% CI: 1.008–1.034) and the presence of rim arterial phase hyperenhancement (rim-APHE; P=0.001; OR =3.844, 95% CI: 1.954–7.561) as independent predictors for P53-mutated HCC, as detailed in Table 2.

Table 1

Comparison of P53 status and the clinical and imaging characteristics of patients from the training and validation cohorts

Characteristics TC (n=224) VC (n=96) P value
P53− HCC (n=96) P53+ HCC (n=128) P value P53− HCC (n=41) P53+ HCC (n=55) P value
Clinical features
   Age, years 59.50±12.27 56.13±11.70 0.04 58.90±10.58 60.51±9.68 0.36 0.09
   Gender 0.13 0.88 0.19
    Male 75 (78.12) 111 (86.72) 32 (78.05) 41 (74.55)
    Female 21 (21.88) 17 (13.28) 9 (21.95) 14 (25.45)
   CA19-9, U/mL 0.84 0.03 0.73
    <37 85 (88.54) 111 (86.72) 33 (80.49) 53 (96.36)
    ≥37 11 (11.46) 17 (13.28) 8 (19.51) 2 (3.64)
   AFP, ng/mL 0.06 0.60 0.39
    <20 56 (58.33) 56 (43.75) 21 (51.22) 30 (54.55)
    ≥20 and <400 23 (23.96) 34 (26.56) 14 (34.15) 14 (25.45)
    ≥400 17 (17.71) 38 (29.69) 6 (14.63) 11 (20.00)
   CEA, ng/mL >0.99 0.80 0.17
    <5 88 (91.67) 118 (92.19) 39 (95.12) 54 (98.18)
    ≥5 8 (8.33) 10 (7.81) 2 (4.88) 1 (1.82)
   Edmondson-Steiner grade <0.001 <0.001 >0.99
    II 85 (88.54) 75 (58.59) 38 (92.68) 30 (54.55)
    III 11 (11.46) 53 (41.41) 3 (7.32) 25 (45.45)
   MVI 0.21 0.12 0.69
    Negative 65 (67.71) 75 (58.59) 31 (75.61) 32 (58.18)
    Positive 31 (32.29) 53 (41.41) 10 (24.39) 23 (41.82)
MRI features
   LTD, mm 35.64±17.22 48.98±23.47 <0.001 37.10±16.83 46.16±21.58 0.03 0.93
   Rim-APHE <0.001 0.006 0.71
    Negative 87 (90.62) 87 (67.97) 37 (90.24) 35 (63.64)
    Positive 9 (9.38) 41 (32.03) 4 (9.76) 20 (36.36)
   Washout at portal venous phase 0.82 0.36 0.66
    Nonperipheral washout 74 (77.08) 102 (79.69) 35 (85.37) 41 (74.55)
    Peripheral washout 4 (4.17) 6 (4.69) 1 (2.44) 1 (1.82)
    No washout 18 (18.75) 20 (15.62) 5 (12.20) 13 (23.64)
   Delayed central enhancement 0.39 0.95 0.89
    Negative 87 (90.62) 121 (94.53) 37 (90.24) 51 (92.73)
    Positive 9 (9.38) 7 (5.47) 4 (9.76) 4 (7.27)
   Enhancing capsule 0.16 0.88 >0.99
    Negative 13 (13.54) 9 (7.03) 5 (12.20) 5 (9.09)
    Positive 83 (86.46) 119 (92.97) 36 (87.80) 50 (90.91)

Data are shown as number of patients (percentage) or mean ± standard deviation. AFP, alpha-fetoprotein; CA19-9, carbohydrate antigen 19-9; CEA, carcinoembryonic antigen; HCC, hepatocellular carcinoma; LTD, largest tumor diameter; rim-APHE, rim arterial phase hyperenhancement; MRI, magnetic resonance imaging; MVI, microvascular invasion; TC, training cohort; VC, validation cohort.

Table 2

Univariable and multivariable analysis of predictive characteristics related with P53 status

Characteristics Univariable Multivariable
OR 95% CI P value OR 95% CI P value
Age 1.004 1.000–1.008 0.10
Gender 1.181 0.986–1.415 0.13
AFP 1.232 1.094–1.385 0.004 0.979 0.718–1.344 0.91
CEA 1.250 0.573–2.726 0.64
CA19-9 1.545 0.818–2.921 0.26
Edmondson-Steiner grade 1.194 1.084–1.315 0.003 0.967 0.66–1.418 0.89
MVI 1.710 1.178–2.479 0.02 0.921 0.536–1.582 0.80
LTD 1.011 1.006–1.016 <0.001 1.021 1.008–1.034 0.007
Rim-APHE 4.555 2.487–8.348 <0.001 3.844 1.954–7.561 0.001
Washout at portal venous phase 1.147 0.996–1.322 0.11
Delayed central enhancement 0.778 0.340–1.782 0.62
Enhancing capsule 1.434 1.133–1.815 0.01 0.522 0.251–1.083 0.14

AFP, alpha-fetoprotein; CA19-9, carbohydrate antigen 19-9; CEA, carcinoembryonic antigen; CI, confidence interval; LTD, largest tumor diameter; rim-APHE, rim arterial phase hyperenhancement; MVI, microvascular invasion; OR, odds ratio.

MC DL model subset selection

Within the MC DL framework, the ResNet34 model demonstrated an AUC of 0.731 (95% CI: 0.654–0.808) in the TC and 0.652 (95% CI: 0.447–0.858) in the VC, outperforming other networks. The diagnostic metrics for the predictive models across both study cohorts are detailed in Table 3. Figure S4 shows the DeLong test results and AUC curves.

Table 3

MC DL model performance of different network prediction models

Model name Cohort Accuracy AUC (95% CI) Sensitivity Specificity
MC3_ResNet18 TC 0.625 0.613 (0.523–0.702) 0.667 0.567
VC 0.594 0.543 (0.330–0.756) 0.500 0.687
MC3_ResNet34 TC 0.669 0.731 (0.654–0.808) 0.656 0.687
VC 0.688 0.652 (0.447–0.858) 0.437 0.937
MC3_ResNet50 TC 0.681 0.664 (0.576–0.751) 0.763 0.567
VC 0.688 0.637 (0.427–0.846) 0.625 0.750
MC3_ResNet101 TC 0.656 0.665 (0.579–0.750) 0.763 0.507
VC 0.656 0.637 (0.436–0.838) 0.375 0.937
MC3_ResNet152 TC 0.506 0.550 (0.461–0.640) 0.226 0.896
VC 0.594 0.602 (0.398–0.805) 0.250 0.937
MC3_DenseNet121 TC 0.650 0.670 (0.585–0.755) 0.742 0.522
VC 0.688 0.641 (0.435–0.846) 0.375 1.000

AUC, area under the curve; CI, confidence interval; DL, deep learning; MC, multi-channel; TC, training cohort; VC, validation cohort.

FF DL model subset selection

From each of the three enhanced magnetic resonance sequences, we extracted eight compressed DL features, which were then combined to form a set of 24 DL features for DCE-MRI. After LASSO feature selection, the CNN models ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, and DenseNet121 were filtered to retain 18, 14, 11, 12, 3, and 6 features, respectively; refer to Figure S5 for details. We identified ResNet101 as the model with an AUC exceeding 0.7 in the TC and 0.6 in the VC. ResNet101 demonstrated an AUC of 0.779 (95% CI: 0.719–0.839) for the TC and 0.663 (95% CI: 0.552–0.774) for the VC, which made it the optimal diagnostic indicator in the FF DL model study. The diagnostic metrics for the predictive models based on the six CNNs across both study cohorts are presented in Table 4. The DeLong test outcomes and AUC curves are shown in Figure S6.

Table 4

FF DL model performance of different network prediction models

Model name Cohort Accuracy AUC (95% CI) Sensitivity Specificity
DL_ResNet18 TC 0.732 0.793 (0.735–0.851) 0.719 0.750
VC 0.573 0.552 (0.434–0.669) 0.745 0.341
DL_ResNet34 TC 0.737 0.798 (0.740–0.855) 0.789 0.667
VC 0.604 0.570 (0.451–0.689) 0.800 0.341
DL_ResNet50 TC 0.732 0.774 (0.713–0.835) 0.664 0.823
VC 0.615 0.582 (0.466–0.698) 0.855 0.293
DL_ResNet101 TC 0.696 0.779 (0.719–0.839) 0.625 0.792
VC 0.667 0.663 (0.552–0.774) 0.691 0.634
DL_ResNet152 TC 0.643 0.670 (0.599–0.741) 0.719 0.542
VC 0.635 0.618 (0.502–0.733) 0.727 0.512
DL_DenseNet121 TC 0.661 0.707 (0.639–0.775) 0.539 0.823
VC 0.552 0.559 (0.443–0.675) 0.364 0.805

AUC, area under the curve; CI, confidence interval; DL, deep learning; FF, feature fusion; TC, training cohort; VC, validation cohort.

Combined model

ResNet101 from the FF DL model outperformed ResNet34 from the MC DL model, so we retained ResNet101 as the final DL model. The clinical model achieved an AUC of 0.716 (95% CI: 0.648–0.783) in the TC and 0.688 (95% CI: 0.581–0.795) in the VC. The radiomics model’s AUC in the TC was 0.703 (95% CI: 0.635–0.772), and in the VC it was 0.519 (95% CI: 0.399–0.639). By integrating the clinical model, radiomics model, and DL model, we observed a significant improvement in AUC, with 0.838 (95% CI: 0.786–0.891) in the TC and 0.702 (95% CI: 0.598–0.807) in the VC. This integration also improved accuracy and sensitivity in the TC, as present in Table 5. Figure S7 shows the DeLong test results and AUC curves. To assess the diagnostic accuracy of the combined model, we constructed a nomogram, as depicted in Figure 3A. The nomogram calibration curves showed good agreement between predicted and actual P53 status in both cohorts, as shown in Figure 3B,3C. Decision curve analysis (DCA) also showed the combined model yielded superior net benefits over individual models across both cohorts, as detailed in Figure 3D,3E.

Table 5

Performances of the predictive model in the training cohort and validation cohort

Model name Cohort Accuracy AUC (95% CI) Sensitivity Specificity
Clinical model TC 0.688 0.716 (0.648–0.783) 0.727 0.635
VC 0.677 0.688 (0.581–0.795) 0.564 0.829
Radiomics model TC 0.661 0.703 (0.635–0.772) 0.703 0.604
VC 0.594 0.519 (0.399–0.639) 0.745 0.390
DL model TC 0.696 0.779 (0.719–0.839) 0.625 0.792
VC 0.667 0.663 (0.552–0.774) 0.691 0.634
Combined model TC 0.790 0.838 (0.786–0.891) 0.836 0.729
VC 0.667 0.702 (0.598–0.807) 0.655 0.683

AUC, area under the curve; CI, confidence interval; DL, deep learning; TC, training cohort; VC, validation cohort.

Figure 3 Nomograms of the P53-mutated HCC prediction model, nomogram calibration curves in the TC and VC, and DCA. (A) Nomograms combining two independent clinical predictors (LTD and rim-APHE), the radiomics model, and the DL model. Nomogram calibration curves in the TC (B) and VC (C), assessing the model’s predictive accuracy. Nomogram DCA in the TC (D) and VC (E), evaluating the clinical applicability and the net benefit of the models in practical decision-making scenarios. DCA, decision curve analysis; DL, deep learning; HCC, hepatocellular carcinoma; LTD, largest tumor diameter; rim-APHE, rim arterial phase hyperenhancement; TC, training cohort; VC, validation cohort.

Discussion

In this retrospective analysis, we derived key DL and radiomic features correlated with P53-mutated HCC from DCE-MRI. Among MC DL models, ResNet34 achieved the highest diagnostic accuracy. ResNet101 outperformed other FF DL models and achieved greater diagnostic efficacy than ResNet34 from the MC DL model. Multivariable analysis identified LTD and rim-APHE as independent risk factors for the clinical model. By integrating the clinical, radiomic, and DL models, we developed and validated a combined model that achieved superior AUCs in both cohorts: 0.838 in the TC and 0.702 in the VC. A nomogram visualized this combined model and DCA confirmed its clinical utility.

DL has been extensively applied to HCC, including U-net-based liver segmentation (13), grading of tumor differentiation (14), prediction of MVI (15), Vessels Encapsulating Tumor Clusters (VETC) (16), and evaluation of transarterial chemoembolization (TACE) outcomes (17). AI models can offer early predictions, potentially prior to the first response assessment, to determine whether patients will benefit from a given treatment or whether alternative therapies should be considered. Among the most valuable biomarkers for HCC prognosis are P53, CK19, and Ki-67, which typically require invasive methods for clinical confirmation through preoperative biopsy or postoperative pathology, as well as localized sampling. A study developed a DL model for CK19 prediction, achieving an AUC of 0.82 in a cohort of 141 patients (18).

The radiomics model performed poorly in the validation cohort (AUC 0.519, 95% CI: 0.399–0.639), approaching random prediction. This finding suggests limited generalizability of the radiomics features, likely attributable to several methodological factors. Our study utilized seven MRI scanners (1.5 and 3.0 T) from different manufacturers with varying imaging parameters, introducing scanner-related variability that affects N4 bias-field correction, resampling to 1×1×1 mm3, and grayscale normalization, these may have been insufficient to fully harmonize features across heterogeneous scanner protocols. These factors explain the marked performance decline observed in the validation cohort. Single-centre design further limits generalizability; multicentre prospective validation is essential. A recent systematic review of MRI radiomics standardization strategies [2019–2022] reported that radiomics generalizability remains unresolved when multiple vendors and scanner models are involved, although preprocessing strategies can enhance analytical robustness (19).

Jia et al. recently developed a multi-sequence MRI DL model for predicting P53-mutated HCC, achieving high predictive performance (test AUC =0.914–0.919) (20). Their study employed a single CNN architecture (EfficientNetV2) with random forest fusion. We systematically compared six CNN architectures (ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, and DenseNet121) across two distinct fusion strategies (multi-channel fusion and feature fusion) to identify the optimal model configuration. This comprehensive comparison provides empirical guidance for CNN selection in P53-mutated HCC prediction. Our approach mirrored a strategy previously used to predict occult cervical lymph-node metastasis and prognosis in early-stage oral and oropharyngeal squamous cell carcinoma, where six distinct CNNs were used to build 17 predictive models (21). The diversity of CNN architectures and depths enables extraction of a broad range of features from the data. The variable performance of different CNN models across tasks and datasets allows for the selection of the most suitable model or combination, enhancing the model’s flexibility and adaptability. In our study, ResNet34 in the MC DL model achieved the highest AUC of 0.731 in the TC and 0.652 in the VC, while ResNet101 in the FF DL model achieved the best performance with an AUC of 0.779 in the TC and 0.663 in the VC.

A study predicting MVI and clinical outcomes in HCC patients found that fusing three MR sequences (T1, T1D, and T1V) achieved 92.11% accuracy, surpassing any single sequence (22). In our investigation, rim-APHE emerged as a risk factor for P53-mutated HCC in the clinical model, prompting us to concentrate on enhancement sequences. We selected AP, PVP, and DP for analysis. Another study using deep-learning models and DCE-MRI sequences to predict preoperative VETC status and prognosis in HCC patients reported higher diagnostic performance with three fused sequences (AUC 0.897) (23). Integrating multiple sequences allows more comprehensive feature extraction and improves model performance and generalizability.

Multi-channel DL concurrently processes multiple input channels within a single model to produce a normalized tumor mask and is well suited to multimodal data. In a study assessing the diagnostic efficacy of CNNs on intravoxel incoherent motion (IVIM) diffusion-weighted MRI for predicting MVI in HCC, nine b-value images were concatenated across the channel dimension, and the CNN extracted deep features directly from the b-value volume and achieved superior MVI prediction, with AUCs of 0.810 (range, 0.760–0.829) (24). Wang et al. also fused features from six MRI sequences to predict MVI in HCC; however, they used a multilayer perceptron (MLP) for feature fusion (25). In another study employing multitask DL based on MRI images to predict MVI and recurrence-free survival in HCC, features from AP, PVP, DWI, and T2WI sequences were fused to form the MRI feature set and achieved high MVI prediction accuracy (26). In this study, the multi-channel fusion yielded AUCs of 0.550–0.731 in TC and 0.543–0.652 in VC. Feature fusion achieved AUCs of 0.670–0.798 in TC and 0.552–0.663 in VC, outperforming multi-channel fusion. Feature fusion integrates information across MRI phases, helping models focus on relevant high-level features, reducing noise sensitivity and overfitting risk. Its simplicity and flexibility also facilitate adaptation to diverse tasks and data types.

Radiomics, DL, and fusion models are increasingly prevalent in medical research. A recent systematic review revealed that fusion models performed well in 63% of studies reviewed, poorly in 25%, and fairly well in 13% (27). In a comparative study examining 3D and 2D DL, radiomics, and fusion models based on CT imaging for predicting occult lymph node metastasis in laryngeal squamous cell carcinoma, late fusion methods were superior to early fusion, suggesting that the performance of fusion models may depend on the fusion strategy employed (28). LR integrated clinical, radiomics and DL models into a combined model. This approach offers operational simplicity, fast processing, low memory use and easy clinical implementation. This choice introduces several methodological risks. Clinical features such as tumor diameter may correlate with DL imaging features, potentially overweighting certain signals. The clinical model uses 2–3 low-dimensional features against hundreds to thousands from radiomics and DL models, creating an imbalance that may bias fusion toward imaging predictions. The radiomics model performed poorly in validation (AUC 0.519), likely introducing noise rather than complementary information. This combined model significantly improved diagnostic performance over individual models, achieving an AUC of 0.838 (95% CI: 0.786–0.891) in the TC and 0.702 (95% CI: 0.598–0.807) in the VC, suggesting that clinical and DL components provided a robust signal that offset radiomics limitations.

There are several limitations in this study. Firstly, this study used internal validation only, random split from a single-centre dataset, rather than external validation with independent data from different centres or time periods. The modest AUC of 0.702 (95% CI: 0.598–0.807) in the validation cohort, compared with 0.838 in the training cohort, suggests moderate performance and potential overfitting, indicating the model is not ready for routine clinical use. Future studies should perform temporal and external validation with independent centres, different MRI scanners, and protocols to confirm robustness and generalisability. Secondly, we focused on conventional CNN architectures (ResNet and DenseNet) and did not evaluate newer architectures such as Vision Transformers, Swin Transformers, or attention-based models. ResNet101 performed best in our feature fusion framework, but newer architectures might better capture long-range dependencies or multi-scale features. Model complexity also requires consideration: deeper architectures such as ResNet101 and ResNet152 may overfit on limited medical imaging datasets. ResNet101 outperformed ResNet152 in validation, suggesting moderate complexity suits this dataset size better than very deep architectures. Future studies should evaluate newer architectures and investigate optimal complexity for HCC datasets of varying sizes. Thirdly, we assumed that the immunohistochemical results of P53 indicate a TP53 mutation (based on the percentage of nuclear staining), which should be confirmed through genetic testing to enhance the accuracy of the results. Fourthly, in addition to analyzing P53 mutation information, establishing long-term follow-up is crucial, and utilizing analytical DL to accurately predict the survival and prognosis of HCC patients should be a focal point of CNN-based DL research. Fifthly, our study primarily focused on MRI enhancement sequences, and further research is needed for other sequences. Lastly, the use of seven 1.5 and 3.0 T MRI scanners with varying parameters from different manufacturers may introduce variability to our results; however, we conducted data normalization to minimize the variability across MRI scanners.


Conclusions

The CNN DL model based on DCE-MRI demonstrates superior diagnostic accuracy in preoperatively predicting P53-mutated HCC. Furthermore, integrating this DL model with clinical and radiomics models improves predictive performance beyond that of any individual model. This integrated approach demonstrates feasibility and potential for preoperative prediction of P53-mutated HCC. However, multicentre external validation using diverse scanners, imaging protocols, and patient populations is necessary before clinical implementation. If validated externally, this model could eventually assist clinicians in treatment decision-making.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1026/rc

Data Sharing Statement: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1026/dss

Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1026/prf

Funding: This work was supported by Science and Technology Commission of Shanghai Municipality (No. 23Y11907400), and Research Project of Luzhou Medical Association (No. 2025-YXX-KY-M-007).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1026/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of Zhongshan Hospital, Fudan University (No. B2021-113R), and this study was granted a waiver for the informed consent requirement.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
  2. Konyn P, Ahmed A, Kim D. Current epidemiology in hepatocellular carcinoma. Expert Rev Gastroenterol Hepatol 2021;15:1295-307. [Crossref] [PubMed]
  3. Liu Y, Su Z, Tavana O, et al. Understanding the complexity of p53 in a new era of tumor suppression. Cancer Cell 2024;42:946-67. [Crossref] [PubMed]
  4. Nault JC, Martin Y, Caruso S, et al. Clinical Impact of Genomic Diversity From Early to Advanced Hepatocellular Carcinoma. Hepatology 2020;71:164-82. [Crossref] [PubMed]
  5. Rebouissou S, Nault JC. Advances in molecular classification and precision oncology in hepatocellular carcinoma. J Hepatol 2020;72:215-29. [Crossref] [PubMed]
  6. Kitao A, Matsui O, Zhang Y, et al. Dynamic CT and Gadoxetic Acid-enhanced MRI Characteristics of P53-mutated Hepatocellular Carcinoma. Radiology 2023;306:e220531. [Crossref] [PubMed]
  7. Calderaro J, Seraphin TP, Luedde T, et al. Artificial intelligence for the prevention and clinical management of hepatocellular carcinoma. J Hepatol 2022;76:1348-61. [Crossref] [PubMed]
  8. Le Berre C, Sandborn WJ, Aridhi S, et al. Application of Artificial Intelligence to Gastroenterology and Hepatology. Gastroenterology 2020;158:76-94.e2. [Crossref] [PubMed]
  9. Xia T, Zhao B, Li B, et al. MRI-Based Radiomics and Deep Learning in Biological Characteristics and Prognosis of Hepatocellular Carcinoma: Opportunities and Challenges. J Magn Reson Imaging 2024;59:767-83. [Crossref] [PubMed]
  10. Qian H, Huang Y, Xu L, et al. Role of peritumoral tissue analysis in predicting characteristics of hepatocellular carcinoma using ultrasound-based radiomics. Sci Rep 2024;14:11538. [Crossref] [PubMed]
  11. Tseng PL, Tai MH, Huang CC, et al. Overexpression of VEGF is associated with positive p53 immunostaining in hepatocellular carcinoma (HCC) and adverse outcome of HCC patients. J Surg Oncol 2008;98:349-57. [Crossref] [PubMed]
  12. Marrero JA, Kulik LM, Sirlin CB, et al. Diagnosis, Staging, and Management of Hepatocellular Carcinoma: 2018 Practice Guidance by the American Association for the Study of Liver Diseases. Hepatology 2018;68:723-50. [Crossref] [PubMed]
  13. Raman AG, Jones C, Weiss CR. Machine Learning for Hepatocellular Carcinoma Segmentation at MRI: Radiology In Training. Radiology 2022;304:509-15. [Crossref] [PubMed]
  14. Zhou Q, Zhou Z, Chen C, et al. Grading of hepatocellular carcinoma using 3D SE-DenseNet in dynamic enhanced MR images. Comput Biol Med 2019;107:47-57. [Crossref] [PubMed]
  15. Wang T, Li Z, Yu H, et al. Prediction of microvascular invasion in hepatocellular carcinoma based on preoperative Gd-EOB-DTPA-enhanced MRI: Comparison of predictive performance among 2D, 2D-expansion and 3D deep learning models. Front Oncol 2023;13:987781. [Crossref] [PubMed]
  16. Dong X, Yang J, Zhang B, et al. Deep Learning Radiomics Model of Dynamic Contrast-Enhanced MRI for Evaluating Vessels Encapsulating Tumor Clusters and Prognosis in Hepatocellular Carcinoma. J Magn Reson Imaging 2024;59:108-19. [Crossref] [PubMed]
  17. Chen M, Kong C, Qiao E, et al. Multi-algorithms analysis for pre-treatment prediction of response to transarterial chemoembolization in hepatocellular carcinoma on multiphase MRI. Insights Imaging 2023;14:38. [Crossref] [PubMed]
  18. Chen Y, Chen J, Zhang Y, et al. Preoperative Prediction of Cytokeratin 19 Expression for Hepatocellular Carcinoma with Deep Learning Radiomics Based on Gadoxetic Acid-Enhanced Magnetic Resonance Imaging. J Hepatocell Carcinoma 2021;8:795-808. [Crossref] [PubMed]
  19. Trojani V, Bassi MC, Verzellesi L, et al. Impact of Preprocessing Parameters in Medical Imaging-Based Radiomic Studies: A Systematic Review. Cancers (Basel) 2024;16:2668. [Crossref] [PubMed]
  20. Jia L, Yang Q, Jiang H, et al. Deep learning-based MRI model for predicting P53-mutated hepatocellular carcinoma. BMC Med Imaging 2025;25:506. [Crossref] [PubMed]
  21. Lan T, Kuang S, Liang P, et al. MRI-based deep learning and radiomics for prediction of occult cervical lymph node metastasis and prognosis in early-stage oral and oropharyngeal squamous cell carcinoma: a diagnostic study. Int J Surg 2024;110:4648-59. [Crossref] [PubMed]
  22. Sun BY, Gu PY, Guan RY, et al. Deep-learning-based analysis of preoperative MRI predicts microvascular invasion and outcome in hepatocellular carcinoma. World J Surg Oncol 2022;20:189. [Crossref] [PubMed]
  23. Yang J, Dong X, Wang F, et al. A deep learning model based on MRI for prediction of vessels encapsulating tumour clusters and prognosis in hepatocellular carcinoma. Abdom Radiol (NY) 2024;49:1074-83. [Crossref] [PubMed]
  24. Liu B, Zeng Q, Huang J, et al. IVIM using convolutional neural networks predicts microvascular invasion in HCC. Eur Radiol 2022;32:7185-95. [Crossref] [PubMed]
  25. Wang F, Chen Q, Chen Y, et al. A novel multimodal deep learning model for preoperative prediction of microvascular invasion and outcome in hepatocellular carcinoma. Eur J Surg Oncol 2023;49:156-64. [Crossref] [PubMed]
  26. Wang F, Zhan G, Chen QQ, et al. Multitask deep learning for prediction of microvascular invasion and recurrence-free survival in hepatocellular carcinoma based on MRI images. Liver Int 2024;44:1351-62. [Crossref] [PubMed]
  27. Demircioğlu A. Are deep models in radiomics performing better than generic models? A systematic review. Eur Radiol Exp 2023;7:11.
  28. Wang W, Liang H, Zhang Z, et al. Comparing three-dimensional and two-dimensional deep-learning, radiomics, and fusion models for predicting occult lymph node metastasis in laryngeal squamous cell carcinoma based on CT imaging: a multicentre, retrospective, diagnostic study. EClinicalMedicine 2024;67:102385. [Crossref] [PubMed]
Cite this article as: Weng J, He Y, Zeng Q, Yao D, Zhou H, Li B, Yang C, Yang B. A novel deep learning and radiomics approach based on DCE-MRI for predicting the P53 mutation status in hepatocellular carcinoma. J Gastrointest Oncol 2026;17(2):79. doi: 10.21037/jgo-2025-1-1026

Download Citation