Mining and experimental validation of machine learning-based immune-related diagnostic biomarkers for hepatocellular carcinoma

Wei Li; Hang Jiang; Jian Duan; Jinlan He; Liping Zhao; Guoping Zhong; Chenghu Fan

doi:10.21037/jgo-2025-1-1085

Original Article

Mining and experimental validation of machine learning-based immune-related diagnostic biomarkers for hepatocellular carcinoma

Wei Li¹, Hang Jiang¹ , Jian Duan², Jinlan He², Liping Zhao¹, Guoping Zhong¹, Chenghu Fan¹

¹Department of Hepatobiliary Surgery, Third People’s Hospital of Yunnan Province, Kunming, China; ²The First Affiliated Hospital of Kunming Medical University, Kunming, China

Contributions: (I) Conception and design: All authors; (II) Administrative support: None; (III) Provision of study materials or patients: H Jiang, J Duan, J He, L Zhao, G Zhong, C Fan; (IV) Collection and assembly of data: H Jiang, J Duan, J He, L Zhao, G Zhong, C Fan; (V) Data analysis and interpretation: H Jiang, J Duan, J He, L Zhao, G Zhong, C Fan; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Hang Jiang, MD. The Third People's Hospital of Yunnan Province, 292 Beijing Road, Kunming 650000, China. Email: jiangyishi@qq.com.

Background: Hepatocellular carcinoma (HCC) is a highly malignant and aggressive tumor. Immune-related genes (IRGs) expression correlates closely with the HCC immune microenvironment, and this study aims to identify immune-related diagnostic markers in HCC.

Methods: HCC-related datasets and IRGs were obtained from public databases (TCGA, ICGC, TISIDB, and InnateDB). Differential expression analysis screened differentially expressed genes (DEGs) between HCC and control samples, while weighted gene co-expression network analysis identified key module genes correlated with immune scoring systems. Candidate genes were obtained via the intersection of DEGs, IRGs, and key module genes. Hub genes were then determined through protein-protein interaction analysis, followed by correlation and survival analyses. Moreover, three machine learning algorithms identified diagnostic biomarkers, which were evaluated via a logit model and receiver operating characteristic curve analysis for HCC diagnostic efficacy. Finally, biomarker expression was validated in clinical samples.

Results: After identifying 8,800 DEGs and 2,337 key module genes, we intersected them with IRGs to obtain 87 candidate genes. Thereafter, 33 hub genes were obtained. The hub genes showed a notable positive correlation, suggesting their potential involvement in regulating common biological processes. Additionally, six hub genes (e.g., FCGR2A, TNFRSF4) exhibited expression correlation with HCC patients’ survival. Three diagnostic biomarkers (FCGR2A, MARCO, TNFRSF4) were further identified via machine learning algorithms; logit model analysis confirmed their significant diagnostic utility for HCC. Reverse transcription quantitative polymerase chain reaction (RT-qPCR) validation showed markedly increased expression of FCGR2A and TNFRSF4, consistent with dataset trends.

Conclusions: This study identified three immune-related diagnostic biomarkers for HCC, which may provide novel insights into HCC prognostic management.

Keywords: Hepatocellular carcinoma (HCC); machine learning; immunity; diagnostic biomarkers

Submitted Dec 28, 2025. Accepted for publication Mar 16, 2026. Published online Apr 30, 2026.

doi: 10.21037/jgo-2025-1-1085

Highlight box

Key findings

• Using bioinformatics and machine learning, we identified FCGR2A, MARCO, and TNFRSF4 as immune-related diagnostic biomarkers for hepatocellular carcinoma (HCC).

• These show higher expression in HCC tissues and correlate with patient survival.

• Receiver operating characteristic analysis indicated good diagnostic performance (high area under the curve).

• Quantitative polymerase chain reaction validation in clinical samples confirmed FCGR2A and TNFRSF4 expression patterns.

What is known and what is new?

• HCC has high heterogeneity and invasiveness; the tumor immune microenvironment influences its progression, but effective immune-related diagnostic biomarkers are lacking.

• This study combines multi‑omics data with machine learning to systematically pinpoint FCGR2A, MARCO, and TNFRSF4 as potential immune diagnostic markers for HCC—for the first time—with preliminary validation in an independent clinical cohort.

What is the implication, and what should change now?

• This study provides new candidate immune biomarkers for early HCC detection, helps elucidate immune microenvironment regulation, and offers a foundation for developing immune diagnostic methods.

• Future work should validate their diagnostic/prognostic value in larger multicenter cohorts, explore their biological roles and pathways in the HCC immune microenvironment, and investigate their combination with current clinical indicators to improve diagnostic sensitivity/specificity.

Introduction

The Global Cancer Statistics 2022 data reveal that hepatic malignancies represent the sixth most commonly identified cancer type globally, while simultaneously maintaining the third position in terms of cancer mortality rates, making it a principal contributor to oncological fatalities in multiple regions worldwide (1). Primary liver cancer is predominantly characterized by hepatocellular carcinoma (HCC), which accounts for approximately 80% of all primary liver neoplasms on a global scale (2). When developing a therapeutic strategy for HCC, clinicians must carefully evaluate tumor-specific characteristics, the degree of liver dysfunction, patient age, comorbid conditions, and regional healthcare infrastructure and technical capabilities (3). Similar to most malignancies, HCC is remarkably heterogeneous. Several therapeutic strategies can be employed, including hepatic transplantation, tumor resection, percutaneous ablation, radiotherapy, transarterial therapies, and systemic agents. HCC treatment algorithms must be continuously adapted based on changes in the patient’s performance status to optimize individualized treatment strategies (4).

Most HCC patients present with advanced disease at diagnosis, contributing to their poor prognosis, which presents a significant challenge in treating liver cancer (5). However, data confirm that comprehensive prevention and control programs comprising surveillance, early diagnosis, and treatment for high-risk populations effectively prevent HCC development and significantly reduce overall mortality (6). Given the multifactorial nature of HCC, unraveling the pathogenic molecular processes involved in hepatocarcinogenesis and developing biomolecular-targeted therapeutic strategies against specific molecular drivers are essential for improving clinical outcomes (5). Tumor microenvironment dynamics are pivotal to cancer progression, and with deeper insights into its behavior and immune-tumor crosstalk, immunotherapy has emerged as a promising strategy (7). The innate and adaptive immune systems work together through synergistic interactions to build effective anti-cancer immune surveillance mechanisms. When tumor-immune interactions become dysfunctional, it may promote tumor immune escape through impaired antigen recognition or the development of a tumor microenvironment with immunosuppressive properties (8).

As a central immune organ, the liver contains abundant innate and adaptive immune cells that play dual roles in immune surveillance: recognizing and removing pathogens and regulating immune responses and host defenses. These cells respond to microbial and metabolic stimuli while suppressing the progression of liver diseases such as HCC (9,10). Currently, inhibitors targeting immune checkpoint genes such as programmed cell death protein 1 (PD-1) and programmed cell death-ligand 1 (PD-L1) (e.g., atezolizumab and nivolumab) have emerged as the new standard for first- or second-line treatment of HCC by blocking immune escape pathways (11). Reduced recognition of tumor-associated antigens by immune cells may be linked to aberrant epigenetic regulation, dysregulated post-transcriptional modification, or altered antigen presentation and processing pathways (12). Multi-omics analyses and experiments suggest that the competitive endogenous RNA (ceRNA) mechanism may play a central role in HCC progression: aberrant microRNAs (miRNAs) (e.g., miR-125b-5p, miR-21-5p) and their target genes (NTF3, PSMD14, etc.) may synergistically contribute to HCC development via methylation-mediated interactions with the immune microenvironment (13). Although significant progress has been made in HCC immunotherapy, the mechanistic role of immune-related genes (IRGs) in tumorigenesis warrants further elucidation, and gene-derived diagnostic biomarkers may not only predict disease progression and therapeutic responses with precision but also serve as novel therapeutic targets. Recent investigations have screened liver cancer-associated genes extensively and uncovered biomarkers exhibiting considerable diagnostic potential (14). Nevertheless, the identification and characterization of immune-related diagnostic markers for HCC remain insufficiently defined.

In this study, we analyzed immune-related diagnostic biomarkers using bioinformatics methods and public database-derived HCC transcriptome data, assessing their diagnostic potential to identify novel research avenues for improving HCC diagnosis and personalized treatment strategies. We present this article in accordance with the TRIPOD reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1085/rc).

Methods

Data source

The gene expression matrix and clinical information for HCC were obtained from The Cancer Genome Atlas (TCGA)-LIHC dataset via the TCGA database (https://portal.gdc.cancer.gov/) (access time: March 1st, 2023). The TCGA-LIHC dataset contained 369 tumor tissue samples and 50 normal tissue samples, with survival information available for 362 patients. The International Cancer Genome Consortium (ICGC)-LIRI dataset (access time: March 1st, 2023) was accessed from the ICGC database (https://daco.icgc.org/) as a validation set, which contained 243 HCC tumor tissue samples and 197 normal tissue samples. The IRGs were sourced from the immunology database and analysis portal (Immport) (https://www.immport.org/shared/home), an integrated repository portal for tumor-immune system interactions (TISIDB) (https://cis.hku.hk/TISIDB/data/), and the systems biology of the innate immune response (InnateDB) (https://www.innatedb.com/) databases, resulting in 6,182, 178, and 3,713 genes, respectively (available online: https://cdn.amegroups.cn/static/public/jgo-2025-1-1085-1.xlsx). By combining and removing duplicates from these genes, a total of 7,990 IRGs were identified.

Differential expression analysis

To systematically identify transcriptomic alterations between HCC and adjacent normal liver tissues, comprehensive differential expression analysis was conducted on the TCGA-LIHC dataset. The analytical approach utilized DESeq2 (v 3.44.3) statistical framework (15), which employs negative binomial distribution modeling for RNA-sequencing data normalization and variance estimation. Stringent selection criteria were applied: genes demonstrating |log₂fold change (FC)| >1 (indicating at least 2-fold expression variation) combined with adjusted P value <0.05 (following Benjamini-Hochberg correction) were considered significantly differentially expressed. Visual representation of the identified genes was accomplished through volcano plots created with ggplot2 (v 3.3.2) (16) for displaying fold-change versus statistical significance, and hierarchical clustering heatmaps generated using pheatmap (v 0.7.7) (17) to illustrate expression patterns across samples.

Weighted gene co-expression network analysis (WGCNA)

The assessment of immune responses within the tumor microenvironment was a crucial aspect of immunotherapy, with studies of immune scores playing an instrumental role in this process. To understand the proportion of immune cells, present in HCC samples, the immune scores were calculated using the estimate (v 1.0.13) package (18) in tumor samples from the TCGA-LIHC dataset. Subsequently, all samples within the TCGA-LIHC dataset were analyzed using the WGCNA (v 1.69) package (19), with the objective of identifying the most strongly correlated modular genes with immunity scores. Initially, all samples underwent clustering, and outliers were excluded to ensure the accuracy of the analysis.

For co-expression network construction, soft thresholding parameters (power values) were determined based on scale-free topology criteria, requiring signed R² values above 0.85 while maintaining mean connectivity approaching zero. The gene co-expression network was subsequently assembled from the processed expression matrix, followed by implementation of hybrid dynamic tree cutting methodology to identify modules, with each module containing no fewer than 100 constituent genes. Following this, a further delineation of co-expression modules was achieved through the construction of hierarchical clustering trees. Immune scoring metrics served as clinical phenotypic characteristics for downstream analysis. To establish associations between immunological parameters and gene co-expression networks, Pearson correlation coefficient analysis was performed using the psych (v 2.4.3) statistical package (20). This approach generated a comprehensive correlation matrix that quantified the relationship strength between immune score values and individual co-expression modules. Modules demonstrating robust correlations were identified using stringent criteria: absolute correlation coefficient |cor| >0.8 combined with statistical significance P<0.05. Subsequently, co-expression modules exhibiting the strongest associations with immune scores were prioritized for further investigation, and all constituent genes within these highly correlated modules were classified as key modular genes for subsequent analyses.

Identification and functional analysis of candidate genes

Overlapping analysis among differentially expressed genes (DEGs), IRGs, and critical modular genes was performed via the ggvenn (v 0.1.10) tool (21) to pinpoint immunity-associated genes in HCC, designated as candidate genes. Following identification, functional annotation of these candidate genes was conducted using clusterProfiler (v 3.16.0) software (22), enabling Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis with statistical significance threshold of adj. P<0.05. Furthermore, the RCircos (v 0.4.15) package (23) was employed for the visualization of the chromosomal distribution of the candidate genes and to gain a detailed insight into their structure. Besides, to investigate the interactions between candidate genes at the protein level, the search tool for the retrieval of interacting genes/proteins (STRING) (https://www.string-db.org) was employed to construct a protein-protein interaction (PPI) network (confidence =0.40). The PPI network was visualised using Cytoscape software (v 3.8.2) (24).

Identification and correlation analysis of hub genes

Candidate gene node scoring was accomplished via the cytoHubba plugin in Cytoscape, utilizing ten different algorithms: maximal clique centrality (MCC), density of maximum neighborhood component (DMNC), maximum neighborhood component (MNC), degree centrality, edge percolated component (EPC), bottleneck, eccentricity, closeness, radiality, and betweenness centrality measures. Following this, the top 60 genes, ranked according to the aforementioned algorithms, were subjected to further analysis. Subsequently, the UpSetR (v 1.4.0) package (25) was employed to illustrate the intersection of genes selected by each algorithm, and the genes that were jointly selected in multiple algorithms were defined as hub genes. In addition, Spearman correlation analyses (P<0.05) were conducted on the hub genes using the psych package.

Drug prediction and survival analysis of hub genes

A search was conducted in the drug-gene interaction database (DGIdb) (https://www.dgidb.org) for drugs with the potential to target hub genes. Those drugs with two or more entries supported by the literature were subsequently selected for further consideration. Cytoscape software was employed to construct the visual representations of drug-hub gene networks afterward. Additionally, Kaplan-Meier (K-M) survival curves were plotted utilizing the survminer (v 0.4.9) package (26) to evaluate the disparity in overall survival (OS) between HCC patient groups exhibiting high and low expression levels of the hub gene (P<0.05).

Discernment of diagnostic biomarkers through machine learning

Three different machine learning approaches were applied to discover hub genes in the TCGA-LIHC cohort for identifying HCC immune-related diagnostic biomarkers. The first method involved the least absolute shrinkage and selection operator (LASSO) regression implemented through glmnet (v 4.0-2) software (27), to screen LASSO-feature genes. This analysis incorporated 10-fold cross-validation procedures for optimal feature selection. Model performance assessment utilized receiver operating characteristic (ROC) curve analysis via pROC (v 1.16.2) package (28), with area under the curve (AUC) calculations demonstrating satisfactory discrimination capability (AUC >0.7).

In the TCGA-LIHC dataset, the support vector machine-recursive feature elimination (SVM-RFE) algorithm, utilizing the e1071 (v 1.7-3) package (29), was employed to compute features (hub genes) ranking weights. The least weighted features were then deleted, and the iteration process was implemented to select the optimal number of features. To identify the SVM-RFE feature genes, the minimum 10-fold CV error value was employed as a selection criterion.

Moreover, the random forest (RF)-RFE algorithm, embedded within the caret (v 4.1.3) package (30), was employed to identify the most predictive features. The optimal number of variables was determined by the root mean square error (RMSE), where a lower RMSE value indicated a higher level of predictive accuracy for the RF model. The variables that exhibited the lowest RMSE values were identified as the RF-feature genes. Following this, the RF model was constructed utilizing the randomForest (v 4.6-14) package (31), resulting in a stabilized error rate as the number of fixed trees was increased to a specified threshold. The RF algorithm was employed for the purpose of ranking the importance of RF-feature genes. The importance of RF-feature genes was determined by ranking them according to two principal metrics for the evaluation of variable importance: the mean increase in mean square error (unscaled %IncMSE) and total decrease in node impurities (IncNodePurity). Additionally, ROC curves were generated using the pROC package, and AUC values were subsequently computed for the RF model (AUC >0.7).

The ICGC-LIRI dataset was employed to independently validate the LASSO, SVM-RFE, and RF models, thereby emphasizing their universality and reliability, particularly through ROC analysis (AUC >0.7). Additionally, potential diagnostic biomarkers were determined through overlapping analysis of feature genes derived from three distinct machine learning approaches, implemented via the ggvenn package.

Assessment of diagnostic biomarkers

The diagnostic performance of the prior biomarkers for HCC was further investigated through additional logistic regression analyses. The logit model was constructed using diagnostic biomarkers. Subsequently, the ROC curves for the logit model and each diagnostic biomarker were plotted using the pROC package in TCGA-LIHC, ICGC-LIRI datasets, and our 20 collected clinical samples, respectively, and the corresponding AUC values were calculated (AUC >0.7). Meanwhile, to further validate the effectiveness of our model, we selected the diagnostic genes for HCC identified in a previous study (14). Using three genes as a group, we constructed a logistic regression model in the TCGA-LIHC dataset for comparative analysis with our model.

Expression level analysis of diagnostic biomarkers

To gain deeper insights into the expression patterns of diagnostic biomarkers pertaining to HCC, an analysis leveraging the Wilcoxon test was conducted on both the TCGA-LIHC and ICGC-LIRI datasets. This analysis compares the differential expression levels of these biomarkers in HCC and normal groups (P<0.05). Furthermore, the mRNA expression levels of diagnostic biomarkers in the HCC and normal groups were evaluated using reverse transcription quantitative polymerase chain reaction (RT-qPCR). A total of 20 samples were obtained from patients with HCC at The Third People’s Hospital of Yunnan Province, comprising 10 cancer samples (HCC group) and 10 paracancer samples (control group). This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. Ethical approval was obtained from Ethics Committee of the Third People’s Hospital of Yunnan Province [(2025) L-170], with informed consent acquired from all participants. Total RNA isolation from frozen HCC and control specimens was accomplished using TRIzol reagent (Ambion, 15596-018CN) following the manufacturer’s protocols. RNA quantification and purity assessment were conducted using NanoPhotometer N50, measuring 1 µL aliquots to determine optimal concentrations for downstream applications. cDNA synthesis was performed using Servicebio’s SweScript First Strand cDNA Synthesis Kit per manufacturer guidelines. For quantitative PCR reactions, cDNA templates were diluted 5–20-fold with nuclease-free ddH2O, then combined with 3 µL diluted cDNA, 5 µL 2×Universal Blue SYBR Green qPCR Master Mix, and 1 µL each of forward and reverse primers (10 µM). Amplification was conducted on a CFX96 real-time PCR system (BIO-RAD, XLFZ006) for 40 cycles following pre-denaturation, with cycling parameters detailed in Table S1. Primer sequences for target biomarkers are listed in Table S2, with GAPDH as an internal control. Relative expression quantification was calculated using the 2^-∆∆CT method. Statistical visualization of mRNA expression differences between HCC and control cohorts was generated using GraphPad Prism 5 software.

Statistical analysis

R software (v 4.2.2) was utilized for all statistical computations, with group comparisons assessed through Wilcoxon testing methodology. Statistical significance was established at P<0.05 threshold level. For RT-qPCR data analysis, Ct value comparisons were performed via unpaired, independent-sample t-test analysis implemented in GraphPad Prism 5 platform.

Results

Acquisition of 8,800 DEGs and 2,337 key module genes

Differential expression profiling identified 8,800 DEGs when comparing HCC specimens with normal tissue samples. Among these DEGs, 6,906 showed increased expression while 1,894 demonstrated decreased expression in HCC samples relative to controls (Figure 1A,1B). The results of the immune scores were presented in detail in https://cdn.amegroups.cn/static/public/jgo-2025-1-1085-2.xlsx. Following the assessment, the sample clustering tree demonstrated the absence of outliers (Figure 1C), confirming the suitability of all samples for incorporation into the WGCNA network. A soft threshold of 8 was identified, along with a signed R² value exceeding 0.85 and a mean connectivity of approximately 0 (Figure 1D). A hierarchical clustering analysis revealed 15 distinct co-expression modules, as shown in Figure 1E. Additionally, the correlation analysis of the modules against immune scores revealed two significantly correlated modules: MEblack (cor =0.91, P<0.001) and MEred (cor =0.89, P<0.001). These two modules collectively contained 2,337 genes, which were designated as key module genes (Figure 1F).

Figure 1 Identification of differentially expressed genes and key module genes. (A) Volcano plot of differentially expressed genes. Red represents genes upregulated in the HCC group, while blue represents genes downregulated in the HCC group. (B) Heatmap of differentially expressed genes. Color indicates the gene expression level; the darker the color, the higher the expression level (red indicates high expression, while blue indicates low expression). (C) Sample dendrogram and trait heatmap. The upper part shows the clustering results, and the lower part represents traits. Color indicates the content; the darker the color, the higher the immune score. (D) Scale-free soft threshold distribution. Both x-axes represent the weight parameter (power value). For the left panel, the y-axis denotes the scale-free fit index; a higher squared correlation coefficient indicates that the network is closer to a scale-free distribution. For the right panel, the y-axis represents the mean value of the adjacency function of all genes in the corresponding gene module. (E) Module clustering dendrogram. Different colors represent different modules. (F) Correlation heatmap between modules and immune score traits. HCC, hepatocellular carcinoma.

Identification of 87 candidate genes and exploration of their functions

Then, 8,800 DEGs, 2,337 key module genes, and 7,990 IRGs were subjected to intersection analysis, resulting in the identification of 87 candidate genes (Figure 2A). Following this, enrichment assays were carried out to gain an initial understanding of the signaling pathways implicated by the candidate genes. The candidate genes were significantly enriched in 707 GO entries, comprising 644 biological processes (BPs), 13 cellular components (CCs) and 50 molecular functions (MFs). Among the discerned BPs, those linked to candidate genes exhibited prominent functions in the processes of “myeloid leukocyte migration”. Concomitantly, CCs were predominantly enriched in “external side of plasma membrane”, while MFs were primarily concentrated within “immune receptor activity” (Figure 2B). The results indicated a primary involvement of candidate genes in immune responses, with particular emphasis on leukocyte migration, chemotaxis, immune receptor signalling, and cytokine interactions. Furthermore, a KEGG enrichment analysis of the candidate genes revealed the enrichment of 32 pathways, like “cytokine-cytokine receptor interaction” and “viral protein interaction with cytokine and cytokine receptor” (Figure 2C). The enrichment of these pathways revealed the potential regulatory function of the candidate genes in immune responses, particularly in the context of the interplay between viral invasion and host defences, as well as cytokine-mediated immune modulation. Additional genomic mapping analysis was performed to examine the chromosome localization of candidate genes, showing their distribution across all chromosomes except chromosomes 10, 13, 14, 21, and Y (Figure 2D). This suggested that these candidate genes, which displayed a widespread distribution within the human genome, might engage in a range of biological processes and functions. Moreover, a PPI network comprising 81 distinct nodes (candidate genes) and 607 edges was constructed (Figure 2E). This revealed interactions among multiple genes, for instance, MARCO was found to interact with FCN1, TREM2, and C1QC.

Figure 2 Identification of hub genes. (A) Acquisition of candidate genes. Red represents immune-related genes; green represents differentially expressed genes; purple represents key module genes. (B) GO enrichment analysis. The x-axis represents gene ratio, and the y-axis represents GO terms. (C) KEGG enrichment analysis. The x-axis represents gene ratio, and the y-axis represents KEGG-enriched terms. (D) Chromosomal localization analysis of candidate genes. (E) Protein-protein interaction network: Lines represent the interaction relationships between them, where the thickness of the lines represents the interaction strength. BP, biological process; CC, cellular component; DEGs, differentially expressed genes; GO, Gene Ontology; IRGs, immune-related genes; KEGG, Kyoto Encyclopedia of Genes and Genomes; MF, molecular function.

Identification and correlation analysis of 33 hub genes

Following this, the 81 candidate genes in the PPI network were incorporated into the 10 algorithms of the “cytoHubba” plug-in. The findings indicated that all 10 algorithms converged on 33 genes, exhibiting remarkable consistency and thereby designating them as hub genes (Figure 3A). Furthermore, a significant positive correlation was observed among the majority of hub genes (P<0.05) (Figure 3B, available online: https://cdn.amegroups.cn/static/public/jgo-2025-1-1085-3.xlsx), implying their involvement in regulating shared biological processes or collaborative functions. In addition, a total of 10 hub genes were predicted corresponding drugs. Specifically, CD274, CTLA4, CSF1R, FCGR2A, FCGR3A, IL2RB, IL10, MMP9, PDCD1, and PTGS2 were predicted 5, 1, 3, 6, 7, 4, 1, 1, 1, and 35 drugs, respectively (Figure 3C). The source of the aforementioned drug was detailed in https://cdn.amegroups.cn/static/public/jgo-2025-1-1085-4.xlsx. Of the aforementioned drugs, six were found to co-target FCGR2A and FCGR3A, such as adalimumab and cetuximab. Besides, K-M curves demonstrated a markedly elevated survival probability (P<0.05) among patients with HCC exhibiting low expression of six hub genes (CCR1, CXCL5, FCGR2A, MMP9, TNFRSF4, and TREM2) (Figure 3D-3I). The results indicated that the expression levels of the aforementioned genes were closely associated with the survival of patients with HCC.

Figure 3 Identification and correlation analysis of 33 hub genes. (A) Identification of 33 hub genes. (B) Correlation analysis of 33 hub genes. Darker colors indicate a stronger correlation, with red representing positive correlation and blue representing negative correlation. (C) Drug-hub gene network diagram. Red ellipses denote hub genes; green diamonds denote drugs. (D) CCR1 prediction probability for OS in HCC patients. (E) CXCL5 prediction probability for OS in HCC patients. (F) FCGR2A prediction probability for OS in HCC patients. (G) MMP9 prediction probability for OS in HCC patients. (H) TNFRSF4 prediction probability for OS in HCC patients. (I) TREM2 prediction probability for OS in HCC patients. HCC, hepatocellular carcinoma; OS, overall survival.

Acquisition of three diagnostic biomarkers: FCGR2A, MARCO, TNFRSF4

Following this, the 33 hub genes were incorporated into the LASSO algorithm. By setting the lambda.min threshold at 0.002, 13 LASSO-feature genes (C5AR1, S100A12, PTGS2, CXCR2, MARCO, CD274, CCL3, IRF8, CSF1R, CCL20, LY96, FCGR2A, and TNFRSF4) were discerned (Figure 4A,4B). In parallel, the regression coefficients among the 13 LASSO-feature genes were presented in Table 1. The ROC curve analysis demonstrated the predictive accuracy of the LASSO model, with an AUC value of 0.996 (Figure 4C). Furthermore, the hub genes were subjected to the SVM-RFE method for the purpose of determining the ranking weights of the features, the results of which were presented in Table 2. The results demonstrated that the minimum 10-fold error value was 0.0317, which corresponded to an optimal number of features of 4 (Figure 4D). Consequently, 4 SVM-RFE signature genes were identified: MARCO, TNFRSF4, SIGLEC1, and FCGR2A. Similarly, hub genes were incorporated into the RF-RFE algorithm. When the variable was set to 6, the RMSE value reached its minimum, indicating the highest level of accuracy in the RF model’s predictions (Figure 4E). At this juncture, the 6 discerned RF-feature genes were CXCR2, FCGR2A, MARCO, PTGS2, S100A12, and TNFRSF4. Subsequently, an RF model was constructed from the aforementioned 6 RF-feature genes, and the error rate stabilised when the number of fixed trees was almost 100 (Figure 4F). The importance of the RF-feature genes was illustrated in Figure 4G. In comparison to the preceding two machine learning algorithms, the ROC curve of the RF model demonstrated the highest AUC value, reaching 1.000 (Figure 4H). This indicated that the model exhibited excellent performance. Importantly, the generalisability of the 3 machine learning algorithms was confirmed in the ICGC-LIRI dataset. The AUC values for the ROC curve exceeded 0.7 for the LASSO model, SVM-RFE model and RF model, respectively (Figure S1). Afterwards, three diagnostic biomarkers were identified through the overlapping intersection of feature genes derived from three machine learning algorithms: FCGR2A, MARCO, and TNFRSF4 (Figure 4I).

Figure 4 Acquisition of 3 diagnostic biomarkers including FCGR2A, MARCO, TNFRSF4. (A) LASSO logistic coefficient penalty plot. The x-axis is log(Lambda), and the y-axis represents cross-validation error. As the penalty coefficient Lambda changes, the coefficients of most variables are finally shrunk to 0. The optimal Lambda value is selected when the 10-fold cross-validation error is minimized, and 13 variables with non-zero coefficients are selected at lambda.min =0.002. (B) LASSO regression cross-validation parameter selection plot, used to determine the optimal regularization parameter λ. (C) LASSO model ROC curve. (D) Plot of the relationship between generalization error and number of features. the x-axis represents the number of feature genes, and the y-axis represents the generalization error under 10-fold cross-validation. The trend of the line in the line chart represents the relationship between the number of feature genes and generalization error. (E) Random forest combined with caret screening process. (F) Random forest error rate. (G) Importance ranking in the random forest model. (H) Random forest ROC curve. (I) Venn diagram of intersections among three machine learning methods. AUC, area under the curve; LASSO, least absolute shrinkage and selection operator; ROC, receiver operating characteristic.

Table 1

The regression coefficients for LASSO-feature genes

Feature genes	Regression coefficients
C5AR1	1.27E−03
S100A12	−2.53E−02
PTGS2	−2.74E−03
CXCR2	−1.67E−03
MARCO	−1.51E−03
CD274	−1.28E−03
CCL3	−5.99E−04
IRF8	−4.06E−04
CSF1R	8.09E−05
CCL20	1.08E−04
LY96	2.02E−04
FCGR2A	5.77E−03
TNFRSF4	2.49E−02

LASSO, least absolute shrinkage and selection operator.

Table 2

The SVM-RFE model feature gene weight ranking

Feature name	Feature ID	Average rank
MARCO	22	4.000000
TNFRSF4	31	5.000000
SIGLEC1	29	7.666667
FCGR2A	14	8.333333
LY96	21	9.333333
CSF1R	10	11.000000
TREM2	33	13.333333
S100A12	27	13.666667
C1QB	1	14.000000
CCL2	4	14.666667

SVM-RFE, support vector machine-recursive feature elimination.

The logit model demonstrated strong performance in assessing the diagnosis of HCC

Next, the diagnostic value of the three biomarkers for HCC was further evaluated in the TCGA-LIHC dataset, the ICGC-LIRI dataset, and our clinical cohort, respectively. For the models built on the public datasets, the logit regression formula is defined as: logit(P) = log[P/(1-P)] = β₀ + β₁×FCGR2A + β₂×MARCO + β₃×TNFRSF4, where P stands for the predicted probability of being classified as HCC, β₀ represents the intercept term, and β₁, β₂, β₃ denote the regression coefficients for each biomarker. The independent variables are the normalized gene expression levels, and the dependent variable is a binary variable coded as 1 for HCC and 0 for control. The AUC values for the logit model were 0.993 and 0.947, respectively, which were higher than those of FCGR2A, MARCO, and TNFRSF4 (Figure 5A,5B). This indicated that the logit model exhibited superior discriminative capacity in forecasting the incidence of HCC. Remarkably, MARCO demonstrated AUC values of 0.977 in the TCGA-LIHC dataset and 0.927 in the ICGC-LIRI dataset, which illustrated its superior diagnostic potential in comparison to the other two diagnostic biomarkers. In the collected clinical cohort, complete separation was a potential concern due to the small sample size (n=20). Therefore, the Firth penalized likelihood estimation was applied to reduce small-sample bias, with the formula as follows (Firth correction): L*(β) = L(β) + 0.5 × log|I(β)|, where L(β) denotes the original likelihood function and I(β) represents the Fisher information matrix. ROC analysis constructed based on the mRNA expression levels of the diagnostic biomarkers showed that the AUC value of the logit regression model was 1, and MARCO also exhibited optimal diagnostic potential with an AUC of 1 (Figure 5C). However, considering the relatively small sample size of the clinical validation set (n=20), this result should be interpreted with caution to avoid potential overfitting. Further validation in expanded clinical cohorts is warranted to confirm the stability and generalizability of these diagnostic biomarkers. The parameters of the logit regression model in the three datasets are presented in Table S3. All three datasets consistently demonstrate that high expression of MARCO is a significant protective factor against HCC (OR <1, P<0.001). Meanwhile, the five diagnostic genes reported in the literature (CCDC107, CXCL12, GIGYF1, GMNN, and IFFO1) were used to form 10 gene combinations in groups of three. Their corresponding AUC values in the TCGA-LIHC dataset are summarized in Table 3 (all AUC >0.7). Although all three-gene combinations achieved favorable diagnostic performance in the TCGA-LIHC cohort, their AUC values were lower than that of the diagnostic biomarker panel identified in the present study.

Figure 5 Biomarkers have good diagnostic value. (A) ROC curve in the TCGA-LIHC dataset. (B) ROC curve in the ICGC-LIRI dataset. (C) ROC curve in the clinical cohort. AUC, area under the curve; ICGC-LIRI, International Cancer Genome Consortium Liver Cancer; ROC, receiver operating characteristic; TCGA-LIHC, The Cancer Genome Atlas Liver Hepatocellular Carcinoma.

Table 3

AUC values of arbitrary three-gene combinations in the TCGA-HCC dataset

Combination No.	Gene combination	TCGA-HCC-AUC
1	CCDC107 + CXCL12 + GIGYF1	0.9866
2	CCDC107 + CXCL12 + GMNN	0.9909
3	CCDC107 + CXCL12 + IFFO1	0.9912
4	CCDC107 + GIGYF1 + GMNN	0.9705
5	CCDC107 + GIGYF1 + IFFO1	0.9495
6	CCDC107 + GMNN + IFFO1	0.9721
7	CXCL12 + GIGYF1 + GMNN	0.9899
8	CXCL12 + GIGYF1 + IFFO1	0.9886
9	CXCL12 + GMNN + IFFO1	0.9915
10	GIGYF1 + GMNN + IFFO1	0.9608

AUC, area under the curve; HCC, hepatocellular carcinoma; TCGA, The Cancer Genome Atlas.

Expression and validation of FCGR2A, MARCO, TNFRSF4

In TCGA-LIHC, there were notable elevations in FCGR2A and TNFRSF4 expression observed in the HCC group (P<0.001), whereas MARCO expression was significantly reduced (P<0.001) (Figure 6A). Remarkably, the expression of all three diagnostic biomarkers in ICGC-LIRI was found to be statistically significant, with regulatory patterns aligning with those observed in TCGA-LIHC (Figure 6B).

Figure 6 Analysis of biomarkers expression levels. (A,B) Analysis of biomarkers expression levels in the TCGA-LIHC dataset and ICGC-LIRI dataset respectively. (C-E) Expression of FCGR2A, TNFRSF4, MARCO in clinical samples. **, P<0.01; ***, P<0.001; ****, P<0.0001. FPKM, fragments per kilobase of transcript per million fragments mapped; HCC, hepatocellular carcinoma; ICGC-LIRI, International Cancer Genome Consortium Liver Cancer; TCGA-LIHC, The Cancer Genome Atlas Liver Hepatocellular Carcinoma.

RT-qPCR validation was performed to assess diagnostic biomarker expression patterns between HCC and control cohorts. Post-RNA isolation, concentration measurements confirmed that all specimens fell within acceptable concentration parameters (Table S4). Quantitative analysis revealed significantly elevated FCGR2A and TNFRSF4 transcript levels alongside markedly reduced MARCO mRNA abundance in HCC samples compared to controls (P<0.01) (Figure 6C-6E). These experimental findings aligned with the predicted expression profiles of the three diagnostic biomarkers derived from TCGA-LIHC and ICGC-LIRI database analyses.

Discussion

Primary liver cancer mainly manifests as HCC, which stems from hepatocytes. The interplay between HCC and immune regulation influences tumor progression, patient prognosis, and treatment selection (9,11,32). Using bioinformatics, we identified three immune-related diagnostic biomarkers (MARCO, FCGR2A, TNFRSF4) for HCC and evaluated their clinical utility. The logit model demonstrated AUC values approaching 1.000 in both datasets, suggesting these biomarkers may correlate with HCC progression.

MARCO, a member of the class A scavenger receptor family, is predominantly expressed on macrophages and dendritic cells as part of these multifunctional receptors (33). This receptor, a 210 kDa type I transmembrane protein consisting of three 54 kDa subunits, shares a tertiary structural similarity with the SR-AI subtype (34). MARCO contains a C-terminal collagen domain and a scavenger receptor cysteine-rich (SRCR) domain, which consists of 110 amino acid residues. This unique structure allows MARCO to bind various anionic ligands, such as modified low-density lipoprotein (34,35). Research shows that MARCO⁺ tumor-associated macrophages (TAMs) form a potent immunosuppressive subpopulation by inhibiting IFN-β secretion, weakening antigen presentation and CD8⁺ T cell function, and blocking activation of the cGAMP/ATP-STING pathway through clearance of dying tumor cells, thereby creating an immunosuppressive microenvironment. MARCO⁺ TAM combination treatment significantly inhibits liver cancer growth and holds potential as a novel therapeutic target for immunotherapy (36). MARCO shows differential expression in malignant tumors, with high expression correlating with reduced OS in bladder urothelial carcinoma and other cancers, but prolonged progression-free intervals in low-grade gliomas and cutaneous melanoma. The receptor positively correlates with immune checkpoint molecules, demonstrates high microsatellite instability in colorectal adenocarcinoma, and shows significant association with key signaling pathways, including TNF-α/NFκB. In cutaneous melanoma, the high-expression group exhibits a superior response to immune checkpoint inhibitor therapy, suggesting MARCO’s potential as a promising biomarker candidate for immunotherapy (37). Studies confirm MARCO downregulation in HCC, with overexpression correlating with a favorable prognosis. It exerts antitumor effects by inhibiting tumor cell migration, invasion, and proliferation while promoting apoptosis in liver cancer cells, as demonstrated in vitro and in vivo (38). This study used RT-qPCR to confirm that MARCO’s expression pattern aligned with these findings.

FCGR2A, as a member of the immunoglobulin Fc receptor family, is expressed on immune-related cell surfaces. The protein functions through IgG antibody binding and is found on neutrophils, monocytes, macrophages, dendritic cells, B cells, and platelets (39,40). Research findings demonstrate a significant increase in FCGR2A expression within head and neck squamous cell carcinoma tissues, and this expression correlates with tumor differentiation grade, distant metastasis, and patient survival. This gene exerts key effects through modulation of immune cell infiltration in the tumor microenvironment (40). In esophageal squamous cell carcinoma, FCGR2A serves as a critical factor in preserving the function of protein interaction networks. Its overexpression correlates positively with poor prognosis and immune regulatory biomarkers (41,42).

Research has shown that FCGR2A binds to antibody-antigen complexes, regulating their abundance and modulating immune responses, including those associated with autoimmunity. Its expression is controlled by cis-regulatory elements and noncoding variants, involving a complex proximal region and five distal enhancers, with implications for immune-mediated diseases (39). Consistent with our findings, previous studies confirm FCGR2A overexpression in HCC, where its expression levels show significant correlation with clinical indicators like extrahepatic metastasis and tumor diameter, and strong association with poor patient prognosis. FCGR2A promotes M2 macrophage polarization and tumor cell proliferation through activation of the IL-4/JAK/STAT6 signaling pathway, while inhibiting natural killer cell and T cell activity, thereby providing a theoretical basis for targeted HCC treatment strategies (43).

TNFRSF4, alternatively designated as OX40 or CD134, represents a member of the TNFR superfamily that serves as a critical regulator of immune responses via T cell activation modulation. This receptor is not constitutively expressed on naive T cells but reaches peak expression 3–4 days following initial activation. It can be rapidly reinduced on effector T cells. Notably, TNFRSF4 expression is triggered by T cell receptor signaling even without concurrent signals from costimulatory receptors like CD28 (44). Consistent with our findings, previous studies confirm that TNFRSF4 expression is significantly higher in tumor tissues than in normal tissues. Its overexpression correlates with advanced clinical features, including pathological stage III/IV and R1/R2/RX residual tumors, and associates with poor OS in HCC patients. TNFRSF4 expression combined with pathological omics metrics may function as prognostic markers for OS, with gender potentially modulating this prognostic association (45). TNFRSF4 expression in HCC varies significantly by age, gender, tumor grade, and disease stage, significantly influencing patient survival outcomes and prognosis. The expression level of TNFRSF4 and clinical risk scores correlate with the frequency of HCC gene mutations, suggesting their potential as prognostic markers (46).

A logistic regression model was used to evaluate the diagnostic value of biomarker combinations for HCC. Analysis revealed that integrating FCGR2A, MARCO, and TNFRSF4 into the model yielded a higher AUC than individual genes, demonstrating the diagnostic potential of this multi-gene biomarker panel for HCC. Compared with a study of HCC biomarkers (47), our three-gene model achieves an AUC of 0.993 in the TCGA cohort, demonstrating superior diagnostic efficacy over RASSF1A methylation (pooled AUC: 0.75). Notably, MARCO alone shows a near-double the AUC (0.977) of RASSF1A. This highlights the advantage of genomic markers over epigenetic markers for HCC diagnosis. These findings were corroborated in clinical samples collected independently. Given the modest sample size, the AUC values may reflect overfitting. Thus, validation in expanded cohorts is warranted. Furthermore, while tumor suppressor methylation reflects epigenetic alterations, FCGR2A/MARCO/TNFRSF4 directly regulate the immune microenvironment, giving them dual implications for both pathology and targeted therapy.

This study identified three immune-related diagnostic biomarkers for HCC through bioinformatics analysis. We further demonstrated that a combined logit model incorporating FCGR2A, MARCO, and TNFRSF4 significantly outperformed individual biomarkers in AUC values, confirming its superior diagnostic value for HCC. While our bioinformatics approach provides novel insights into HCC biomarkers, future validation in larger clinical cohorts remains essential. Further exploration of these biomarkers’ biological mechanisms in HCC progression is warranted.

Conclusions

In this study, 8,800 DEGs were identified through differential expression analysis between HCC and normal samples. Using the Estimate algorithm to calculate immune scores and WGCNA for module screening, 2,337 immune-related key genes were identified. Following the integration of 7,990 immune genes from multiple databases, 87 HCC-related candidate genes were further screened. Through PPI network analysis and three machine learning methods (LASSO, SVM-RFE, and RF), MARCO, FCGR2A, and TNFRSF4 were ultimately identified as diagnostic biomarkers. The logit model constructed using these genes significantly improved predictive accuracy for HCC diagnosis compared to models using single genes, providing a basis for clinical management of high-risk patients.

Acknowledgments

We would like to express our sincere gratitude to all individuals and organizations who supported and assisted us throughout this research. Special thanks to the following authors: Jiang Hang. In conclusion, we extend our thanks to everyone who has supported and assisted us along the way. Without their support, this research would not have been possible.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1085/rc

Data Sharing Statement: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1085/dss

Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1085/prf

Funding: This work was supported by the Healthcare Talent Development Program (No. 2023MY007).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-1-1085/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the ethical committee of the Ethics Committee of the Third People’s Hospital of Yunnan Province [(2025) L-170] and informed consent was obtained from all individual participants.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
Smith EN, Bashir MR, et al. ACR Appropriateness Criteria® Staging and Follow-Up of Primary Liver Cancer. J Am Coll Radiol 2025;22:S699-712. [Crossref] [PubMed]
Yang JD, Hainaut P, Gores GJ, et al. A global view of hepatocellular carcinoma: trends, risk, prevention and management. Nat Rev Gastroenterol Hepatol 2019;16:589-604. [Crossref] [PubMed]
Vogel A, Meyer T, Sapisochin G, et al. Hepatocellular carcinoma. Lancet 2022;400:1345-62. [Crossref] [PubMed]
Wang Y, Deng B. Hepatocellular carcinoma: molecular mechanism, targeted therapy, and biomarkers. Cancer Metastasis Rev 2023;42:629-52. [Crossref] [PubMed]
Fitzmaurice C, Allen C, et al. Global, Regional, and National Cancer Incidence, Mortality, Years of Life Lost, Years Lived With Disability, and Disability-Adjusted Life-years for 32 Cancer Groups, 1990 to 2015: A Systematic Analysis for the Global Burden of Disease Study. JAMA Oncol 2017;3:524-48. [Crossref] [PubMed]
Yu H, Wen X, Cui M, et al. Preclinical exploration and current clinical applications of immunotherapeutic strategies for hepatocellular carcinoma. Front Immunol 2026;17:1769251. [Crossref] [PubMed]
Lu J, Luo Y, Rao D, et al. Myeloid-derived suppressor cells in cancer: therapeutic targets to overcome tumor immune evasion. Exp Hematol Oncol 2024;13:39. [Crossref] [PubMed]
Giraud J, Chalopin D, Blanc JF, et al. Hepatocellular Carcinoma Immune Landscape and the Potential of Immunotherapies. Front Immunol 2021;12:655697. [Crossref] [PubMed]
Jayant K, Habib N, Huang KW, et al. Immunological Basis of Genesis of Hepatocellular Carcinoma: Unique Challenges and Potential Opportunities through Immunomodulation. Vaccines (Basel) 2020;8:247. [Crossref] [PubMed]
Sangro B, Sarobe P, Hervás-Stubbs S, et al. Advances in immunotherapy for hepatocellular carcinoma. Nat Rev Gastroenterol Hepatol 2021;18:525-43. [Crossref] [PubMed]
Ma C, Cheng J, Gu J, et al. Epigenetic drugs in cancer therapy: mechanisms, immune modulation, and therapeutic applications. Mol Biomed 2025;6:132. [Crossref] [PubMed]
Lin XH, Li DP, Liu ZY, et al. Six immune-related promising biomarkers may promote hepatocellular carcinoma prognosis: a bioinformatics analysis and experimental validation. Cancer Cell Int 2023;23:52. [Crossref] [PubMed]
Liu Y, Zhang H, Xu Y, et al. Five Critical Gene-Based Biomarkers With Optimal Performance for Hepatocellular Carcinoma. Cancer Inform 2023;22:11769351231190477. [Crossref] [PubMed]
Liu T, Su X, Kong X, et al. Whole transcriptome sequencing identifies key lncRNAs,circRNAs, and mRNAs for exploring the pathogenesis and therapeutic target of mouse pneumoconiosis. Gene 2024;901:148169. [Crossref] [PubMed]
Pan YQ, Xiao Y, Long T, et al. Prognostic value of lncRNAs related to fatty acid metabolism in lung adenocarcinoma and their correlation with tumor microenvironment based on bioinformatics analysis. Front Oncol 2022;12:1022097. [Crossref] [PubMed]
Yuan SM, Chen X, Qu YQ, et al. C6 and KLRG2 are pyroptosis subtype-related prognostic biomarkers and correlated with tumor-infiltrating lymphocytes in lung adenocarcinoma. Sci Rep 2024;14:24861. [Crossref] [PubMed]
Yang RH, Liang B, Li JH, et al. Identification of a novel tumour microenvironment-based prognostic biomarker in skin cutaneous melanoma. J Cell Mol Med 2021;25:10990-1001. [Crossref] [PubMed]
Zhou Y, Wu W, Cai W, et al. Prognostic prediction using a gene signature developed based on exhausted T cells for liver cancer patients. Heliyon 2024;10:e28156. [Crossref] [PubMed]
Zheng J, Zhang T, Guo W, et al. Integrative Analysis of Multi-Omics Identified the Prognostic Biomarkers in Acute Myelogenous Leukemia. Front Oncol 2020;10:591937. [Crossref] [PubMed]
Salichos L, Thayavally R, Kloen P, et al. Human nonunion tissues display differential gene expression in comparison to physiological fracture callus. Bone 2024;183:117091. [Crossref] [PubMed]
Xu S, Hu E, Cai Y, et al. Using clusterProfiler to characterize multiomics data. Nat Protoc 2024;19:3292-320. [Crossref] [PubMed]
Liu Y, Zhao Y, Zhang S, et al. Developing a prognosis and chemotherapy evaluating model for colon adenocarcinoma based on mitotic catastrophe-related genes. Sci Rep 2024;14:1655. [Crossref] [PubMed]
Wang T, Zhang W, Fang C, et al. Research on the Regulatory Mechanism of Ginseng on the Tumor Microenvironment of Colorectal Cancer based on Network Pharmacology and Bioinformatics Validation. Curr Comput Aided Drug Des 2024;20:486-500. [Crossref] [PubMed]
Lei T, Luo N, Song C, et al. Comparative Genomics Reveals Three Genetic Groups of the Whitefly Obligate Endosymbiont Candidatus Portiera aleyrodidarum. Insects 2023;14:888. [Crossref] [PubMed]
Hu X, Zhang Y, Yu H, et al. The role of YAP1 in survival prediction, immune modulation, and drug response: A pan-cancer perspective. Front Immunol 2022;13:1012173. [Crossref] [PubMed]
Li X, Wang Y, Wu W, et al. A Novel Risk Score Model Based on Eleven Extracellular Matrix-Related Genes for Predicting Overall Survival of Glioma Patients. J Oncol 2022;2022:4966820. [Crossref] [PubMed]
Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011;12:77. [Crossref] [PubMed]
Huang H, Ge C, Dai Y, et al. Exploring Ferroptosis-Associated Gene Signatures as Diagnostic and Therapeutic Targets for Sepsis-Induced Cardiomyopathy. Cureus 2024;16:e60439. [Crossref] [PubMed]
Ding P, Du Y, Jiang X, et al. Establishment and analysis of a novel diagnostic model for systemic juvenile idiopathic arthritis based on machine learning. Pediatr Rheumatol Online J 2024;22:18. [Crossref] [PubMed]
Pellegrino E, Jacques C, Beaufils N, et al. Machine learning random forest for predicting oncosomatic variant NGS analysis. Sci Rep 2021;11:21820. [Crossref] [PubMed]
Oura K, Morishita A, Tani J, et al. Tumor Immune Microenvironment and Immunosuppressive Therapy in Hepatocellular Carcinoma: A Review. Int J Mol Sci 2021;22:5801. [Crossref] [PubMed]
Zhou G, Zhang L, Shao S. The application of MARCO for immune regulation and treatment. Mol Biol Rep 2024;51:246. [Crossref] [PubMed]
Bowdish DM, Loffredo MS, Mukhopadhyay S, et al. Macrophage receptors implicated in the "adaptive" form of innate immunity. Microbes Infect 2007;9:1680-7. [Crossref] [PubMed]
Kraal G, van der Laan LJ, Elomaa O, et al. The macrophage receptor MARCO. Microbes Infect 2000;2:313-6. [Crossref] [PubMed]
Ding L, Qian J, Yu X, et al. Blocking MARCO(+) tumor-associated macrophages improves anti-PD-L1 therapy of hepatocellular carcinoma by promoting the activation of STING-IFN type I pathway. Cancer Lett 2024;582:216568. [Crossref] [PubMed]
Dong Q, Zhang S, Zhang H, et al. MARCO is a potential prognostic and immunotherapy biomarker. Int Immunopharmacol 2023;116:109783. [Crossref] [PubMed]
Xiao Y, Chen B, Yang K, et al. Down-regulation of MARCO associates with tumor progression in hepatocellular carcinoma. Exp Cell Res 2019;383:111542. [Crossref] [PubMed]
Dahlqvist J, Fulco CP, Ray JP, et al. Systematic identification of genomic elements that regulate FCGR2A expression and harbor variants linked with autoimmune disease. Hum Mol Genet 2022;31:1946-61. [Crossref] [PubMed]
Dai Y, Chen W, Huang J, et al. FCGR2A Could Function as a Prognostic Marker and Correlate with Immune Infiltration in Head and Neck Squamous Cell Carcinoma. Biomed Res Int 2021;2021:8874578. [Crossref] [PubMed]
Lu S, Li N, Peng Z, et al. Fc fragment of immunoglobulin G receptor IIa (FCGR2A) as a new potential prognostic biomarker of esophageal squamous cell carcinoma. Chin Med J (Engl) 2021;135:482-4. [Crossref] [PubMed]
Lei K, Chen W, Wang A, et al. Multi-omics analysis of tumor necrosis factor superfamily 4 reveals its prognostic value with T cell exhaustion feature in cancer. Discov Oncol 2025;16:1045. [Crossref] [PubMed]
Li S, Shen W, Yu T, et al. FCGR2A contributes to M2 macrophages polarization in HCC through IL-4/JAK/STAT6 axis. Transl Oncol 2025;58:102429. [Crossref] [PubMed]
Bansal-Pakala P, Jember AG, Croft M. Signaling through OX40 (CD134) breaks peripheral T-cell tolerance. Nat Med 2001;7:907-12. [Crossref] [PubMed]
Yang H, Chen Y, Zhao A, et al. Construction of a diagnostic model based on random forest and artificial neural network for peri-implantitis. Hua Xi Kou Qiang Yi Xue Za Zhi 2024;42:214-26. [PubMed]
Wang D, Hu H, Ding H, et al. Elevated expression of TNFRSF4 impacts immune cell infiltration and gene mutation in hepatocellular carcinoma. Cancer Biomark 2023;36:147-59. [Crossref] [PubMed]
Dong X, He H, Zhang W, et al. Combination of serum RASSF1A methylation and AFP is a promising non-invasive biomarker for HCC patient with chronic HBV infection. Diagn Pathol 2015;10:133. [Crossref] [PubMed]

Cite this article as: Li W, Jiang H, Duan J, He J, Zhao L, Zhong G, Fan C. Mining and experimental validation of machine learning-based immune-related diagnostic biomarkers for hepatocellular carcinoma. J Gastrointest Oncol 2026;17(3):167. doi: 10.21037/jgo-2025-1-1085

Mining and experimental validation of machine learning-based immune-related diagnostic biomarkers for hepatocellular carcinoma

Highlight box

Introduction

Methods

Data source

Differential expression analysis

Weighted gene co-expression network analysis (WGCNA)

Identification and functional analysis of candidate genes

Identification and correlation analysis of hub genes

Drug prediction and survival analysis of hub genes

Discernment of diagnostic biomarkers through machine learning

Assessment of diagnostic biomarkers

Expression level analysis of diagnostic biomarkers

Statistical analysis

Results

Acquisition of 8,800 DEGs and 2,337 key module genes

Identification of 87 candidate genes and exploration of their functions

Identification and correlation analysis of 33 hub genes

Acquisition of three diagnostic biomarkers: FCGR2A, MARCO, TNFRSF4

Table 1

Table 2

The logit model demonstrated strong performance in assessing the diagnosis of HCC

Table 3

Expression and validation of FCGR2A, MARCO, TNFRSF4

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share