A hybrid molecular-imaging model for high-accuracy early colorectal cancer diagnosis

Yong Zhao; Weirong Zeng

doi:10.21037/jgo-2025-687

Original Article

A hybrid molecular-imaging model for high-accuracy early colorectal cancer diagnosis

Yong Zhao, Weirong Zeng

Department of Clinical Laboratory, Wuhan Third Hospital, Wuhan, China

Contributions: (I) Conception and design: Y Zhao; (II) Administrative support: W Zeng; (III) Provision of study materials or patients: Y Zhao; (IV) Collection and assembly of data: Both authors; (V) Data analysis and interpretation: Y Zhao; (VI) Manuscript writing: Both authors; (VII) Final approval of manuscript: Both authors.

Correspondence to: Yong Zhao, MM. Department of Clinical Laboratory, Wuhan Third Hospital, No. 241, Pengliuyang Road, Wuchang District, Wuhan 430061, China. Email: zhaobrave999@163.com.

Background: Early and accurate detection of colorectal cancer (CRC) is crucial for improving patient survival; however, current screening methods often suffer from high false-negative rates, hindering timely diagnosis and treatment. This study aims to develop an innovative dual-path strategy that integrates molecular biomarkers with artificial intelligence (AI)-driven imaging techniques to enhance CRC detection accuracy and overcome limitations in existing screening methods.

Methods: We utilized transcriptomic data from the Gene Expression Omnibus (GEO) database to identify nine key molecular biomarkers associated with CRC, including CDC25B and TEAD4, through differential expression analysis. Machine learning algorithms were employed to assess the diagnostic performance of these biomarkers. In parallel, an edge-aware Mamba-enhanced transformer network (EMT-Net) was developed for imaging segmentation, tested on the Computer Vision Center-Clinic Database (CVC-ClinicDB).

Results: The molecular biomarkers showed significant diagnostic potential, achieving an area under the receiver operating characteristic curve (AUROC) of 0.987 in independent validation. The EMT-Net model demonstrated superior segmentation performance compared to current state-of-the-art methods on the CVC-ClinicDB, showing improved accuracy and precision in CRC detection.

Conclusions: By combining molecular biomarker analysis with advanced imaging segmentation, our dual-path strategy offers complementary advantages: biological insights from molecular data and clinical precision from imaging techniques. This integrated approach shows exceptional cross-dataset robustness, with significant potential to enhance early CRC detection in clinical practice.

Keywords: Colorectal cancer (CRC); biomarkers; deep learning (DL); polyp segmentation; artificial intelligence (AI)

Submitted Aug 22, 2025. Accepted for publication Jan 04, 2026. Published online Feb 12, 2026.

doi: 10.21037/jgo-2025-687

Highlight box

Key findings

• A dual-path artificial intelligence framework integrating molecular biomarkers and endoscopic image segmentation was developed for early colorectal cancer (CRC) detection.

• Nine transcriptome-derived biomarkers demonstrated robust diagnostic performance across independent cohorts.

• The proposed edge-aware Mamba-enhanced transformer network achieved state-of-the-art polyp segmentation accuracy across multiple public colonoscopy datasets.

What is known and what is new?

• Existing CRC screening approaches based on either molecular assays or imaging alone suffer from limited sensitivity, robustness, or clinical interpretability.

• This study proposes a parallel yet synergistic dual-path strategy that integrates molecular risk profiling with edge-aware transformer-based polyp segmentation, enabling complementary functional and structural assessment.

What is the implication, and what should change now?

• This framework supports clinically actionable risk stratification by combining molecular and imaging evidence.

• It provides a scalable blueprint for intelligent CRC screening systems that can be embedded into real-world clinical workflows.

Introduction

Colorectal cancer (CRC) ranks as the third most common malignant tumor of the digestive system globally and the second leading cause of cancer-related mortality (1,2). Its incidence continues to rise among middle-aged and elderly populations while showing a gradual trend toward younger demographics, emerging as a major public health threat worldwide (3,4). Extensive research indicates that over 90% of CRC cases originate from adenomatous polyps (5), following the “adenoma-carcinoma” sequence through multistage progression from benign lesions to invasive tumors (5). This characteristic provides a theoretical foundation for early detection and intervention (6,7). Currently, colonoscopy remains the primary screening and diagnostic method for CRC, yet its accuracy heavily depends on operator expertise, with limitations in identifying morphologically complex or small lesions (8,9). Furthermore, manual interpretation in large-scale screening scenarios faces challenges of low efficiency and strong subjectivity. Consequently, developing more intelligent, efficient, and clinically generalizable early detection technologies has become a crucial research direction in CRC prevention and control (Figure S1).

However, current strategies for early CRC screening still suffer from several critical limitations. For instance, the diagnostic performance of colonoscopy is highly dependent on operator expertise, with limited sensitivity for detecting diminutive polyps (<5 mm), flat lesions, and low-contrast regions (10-12). Manual interpretation is also subject to operator-related variability, resulting in suboptimal inter-observer consistency (13). Although stool- or blood-based molecular assays are non-invasive, they generally lack spatial localization capability and therefore cannot accurately reflect local tissue architecture, lesion extent, or morphological characteristics (14,15). Moreover, both imaging-based techniques and molecular detection approaches exhibit insufficient robustness and transferability across platforms, devices, and populations, limiting their broader clinical applicability (16,17).

Although existing hybrid or multimodal models have attempted to integrate information from multiple data sources, most approaches remain limited to shallow feature concatenation or simple weighted fusion. Specifically, (I) they lack a true dual-pathway synergistic architecture and fail to establish a clear complementary relationship between functional layers (e.g., molecular signaling) and structural layers (e.g., tissue imaging) (18); (II) they do not provide a unified decision-level integration framework, with most studies unable to translate model outputs into explicit clinical management pathways at the “positive/non-positive” decision level (19); (III) the majority of models lack cross-dataset validation, resulting in limited generalizability (16,20); and (IV) molecular-level features are often insufficiently interpretable from a biological perspective, thereby limiting their utility for mechanistic inference and clinical translation (21,22).

In contrast to these limitations, the dual-pathway strategy developed in this study—integrating molecular biomarkers with image segmentation—provides complementary information from both functional and structural dimensions. The molecular pathway elucidates the biological mechanisms underlying disease initiation and progression, whereas the imaging pathway captures the spatial distribution and macroscopic structural characteristics of lesions. These two pathways are inherently complementary in terms of temporal sensitivity, detection window, and interpretability. Furthermore, we introduce an artificial intelligence (AI)-based decision controller that enables deep integration of the dual pathways at the decision level. By leveraging pathway-specific risk scores (P_mol and P_img), this framework establishes four clinically actionable management scenarios, thereby addressing a critical gap in existing hybrid models that lack executable decision logic. Collectively, this design substantially enhances the clinical deployability and translational innovation of the proposed model.

In recent years, deep learning (DL) technologies have achieved remarkable breakthroughs in medical image processing, demonstrating promising applications in polyp detection and segmentation within colonoscopy images (23,24). Models such as U-Net++, SegFormer, and Polyp-PVT have delivered outstanding performance on multiple public datasets, advancing AI-assisted CRC screening (25). However, existing models still face structural challenges in image feature modeling: convolutional neural networks (CNNs) excel at local feature extraction but exhibit limited global modeling capacity; Transformer architectures possess global perception capabilities yet tend to overlook detailed regions in medical images with relatively weaker edge recognition; emerging sequence modeling methods like Mamba show advantages in long-range dependency capture but remain inadequate in processing images with blurred boundaries or irregular morphology.

These limitations become particularly pronounced in CRC polyp recognition tasks, compromising model robustness and clinical adaptability in complex real-world environments. Therefore, developing intelligent segmentation models that integrate both global understanding and local fine-grained perception represents a key direction for advancing AI applications in colonoscopy image analysis (Figure S2).

With the rapid development of multi-omics technologies, gene expression analysis has demonstrated broad prospects in early CRC identification, molecular subtyping, and targeted therapy. Transcriptomic data provides critical support for systematically identifying CRC-related molecular features and elucidating underlying pathogenic mechanisms (26,27). In this study, systematic analysis of public transcriptomic data has identified a group of potential biomarkers closely associated with CRC. These genes participate in multiple known tumor signaling pathways, exhibiting strong molecular relevance and potential biological interpretability. The systematic identification of biomarkers not only expands molecular-level understanding of CRC but also provides substantial support for the design and optimization of structural recognition models. The supplementary molecular dimension enhances the biological plausibility of the entire research framework, effectively promoting the refinement and extension of intelligent image analysis methods for early CRC detection, thereby establishing an important foundation for constructing integrated structure-function intelligent recognition systems (Figure S3).

Building upon this foundation, our study has developed a parallel analytical strategy that synergistically combines structural and molecular recognition approaches to identify key features of CRC through both endoscopic imaging and transcriptomic data analysis. At the structural level, the imaging pathway establishes a DL-based segmentation model using authentic endoscopic images, achieving automated detection and precise localization of colonic polyps. At the molecular level, the identified CRC-associated biomarkers provide critical guidance for feature selection, model optimization, and mechanistic interpretation.

This dual-path strategy fully capitalizes on the complementary nature of imaging and molecular data. While maintaining independent modeling frameworks for each pathway, the approach significantly enhances the overall system’s reliability, interpretability, and clinical utility. The parallel advancement along both structural and molecular dimensions offers robust methodological support for developing multimodal intelligent recognition systems, with strong potential to accelerate the clinical translation of AI technologies in endoscopic diagnostics (Figure S4).

This study aims to explore a parallel AI strategy integrating intelligent image segmentation and molecular feature identification, striving to enhance the accuracy and clinical applicability of early CRC detection while ensuring technological advancement. The imaging module delivers refined polyp recognition in real colonoscopy scenarios, markedly improving lesion detection accuracy and boundary delineation capabilities. Simultaneously, the molecular profiling provides theoretical underpinnings for model design while strengthening biological interpretability.

The coordinated dual-path approach achieves performance optimization across both technical and cognitive domains. Beyond enhancing segmentation model performance, it generates clinically actionable auxiliary information for physicians, potentially improving screening efficiency while reducing diagnostic errors. The comprehensive research framework demonstrates excellent generalizability and clinical adaptability, establishing a solid foundation for developing systematic, intelligent CRC screening platforms (Figure S5).

Although current CRC screening workflows provide relatively clear management pathways for positive results, standardized and intelligent decision rules for risk stratification of borderline or non-positive findings remain lacking. In this context, the dual-pathway strategy proposed in this study not only enables independent discrimination along molecular and imaging dimensions, but also demonstrates the potential to function as an AI-based decision controller. This design allows screening outcomes to be automatically triaged into subsequent diagnostic, therapeutic, or surveillance pathways, thereby establishing an executable and clinically actionable workflow logic. We present this article in accordance with the TRIPOD reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-687/rc).

Methods

Transcriptomic data acquisition and preprocessing

This study obtained four CRC microarray datasets from Gene Expression Omnibus (GEO): GSE10950 (24 CRC vs. 24 normal), GSE25070 (26 vs. 26), GSE74602 (30 vs. 30), and GSE142279 (20 vs. 20), comprising 200 samples (100 tumors and 100 matched normal tissues). To mitigate model bias and overfitting caused by class imbalance, we included all samples with sufficient sample size and a near-balanced case/control ratio from the four datasets as the training candidate set, while the remaining samples were independently reserved for subsequent testing. Within the training candidate set, we first merged the four datasets based on public gene annotations and randomly split them into a training subset (140 samples) and a validation subset (60 samples) at a 7:3 ratio. Subsequently, we applied the ComBat algorithm from the R package sva to jointly correct for platform and batch effects in both the training and validation subsets, ensuring comparability for downstream analyses. This study exclusively used publicly available, de-identified data from open-access databases. No human participants were directly involved, and therefore ethical approval was not required. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Using the corrected training set, we conducted parallel differential expression analysis and weighted gene co-expression network analysis (WGCNA). Candidate genes from both analyses were intersected and mapped to STRING protein-protein interaction (PPI) network. To calculate hub metrics such as degree centrality and betweenness centrality for further refinement of the gene list, we performed the following analytical steps. Subsequently, using the complete training + validation dataset, these were refined via least absolute shrinkage and selection operator (LASSO) regression (feature selection) and receiver operating characteristic (ROC) evaluation. Ultimately identifying a set of high-efficiency diagnostic markers with area under the receiver operating characteristic curve (AUROC) ≥0.92 through cross-validation.

Finally, the selected biomarkers were validated on independent test sets (GSE21815, GSE106582) using random forest (RF), support vector machine (SVM), artificial neural network (ANN) and gradient boosting machine (GBM) models, evaluating AUROC, sensitivity and specificity. Additionally, CIBERSORT analysis was performed to analyze the correlation between these genes and immune cell subsets, thereby enhancing biological interpretability. This comprehensive pipeline—spanning data acquisition, batch effect correction, differential and co-expression analysis, network centrality screening, machine learning-based feature refinement, and multi-tier validation—forms a structured, closed-loop biomarker discovery framework, ensuring the robustness and generalizability of both biomarkers and the predictive model.

Construction of public colonoscopy image datasets and task design

We evaluated our method on four public polyp datasets: Computer Vision Center-Clinic Database (CVC-ClinicDB), Kvasir Polyp Segmentation Dataset (Kvasir-SEG), Computer Vision Center Colon Database (CVC-ColonDB), and ETIS-Larib Polyp Database (ETIS-Larib). To ensure fair comparison, we followed the experimental protocol of PraNet, constructing a training set of 1,450 images extracted from Kvasir-SEG and CVC-ClinicDB, with testing performed across all four datasets. CVC-ClinicDB and Kvasir-SEG served as in-domain datasets, while CVC-ColonDB and ETIS constituted out-of-domain datasets. The dataset distribution is shown in Figure S6. Kvasir-SEG represents the largest dataset, containing 1,000 images of varying sizes collected and annotated by experienced clinicians. The dataset exhibits considerable diversity in polyp morphology and size, with image resolutions ranging from 332×487 to 1,920×1,072 pixels. It includes 700 large polyps (>160×160 pixels) and 48 small polyps (<64×64 pixels). CVC-ClinicDB comprises 612 images acquired from 31 colorectal sequences, featuring 31 distinct polyp types with corresponding polyp and background segmentation masks. All frames maintain a consistent resolution of 384×288 pixels. We conducted in-domain testing on these two datasets to validate segmentation performance. CVC-ColonDB contains 380 images collected from 15 different colonoscopy sequences, while ETIS includes 196 images extracted from 34 endoscopic videos. Our cross-domain testing using these datasets demonstrated the model’s accurate prediction capability and generalization performance on unseen data. Notably, ETIS contains the largest image dimensions (1,225×966 pixels) among all datasets. The substantial variation in polyp morphology and size, along with frequent occurrences of multiple small polyps in single images, renders ETIS particularly challenging for segmentation tasks.

Batch effect correction

The merged training datasets were processed using the ComBat method from the R sva package for batch effect correction, with principal component analysis (PCA) applied to evaluate the correction effectiveness. Post-correction results showed more concentrated sample distributions and significantly reduced inter-batch variations.

Differential expression analysis and co-expression network construction

Differential gene screening was performed using the limma R package on the corrected data, with selection criteria set at |log fold change (FC)| >1 and adjusted P value <0.05 to identify genes showing significant differential expression between CRC and normal tissues.

WGCNA was conducted using the CEMiTool R package to automatically identify functional modules and calculate their activity differences across sample groups, supplemented by fgsea for module functional annotation to further clarify biological significance.

PPI network construction and diagnostic gene screening

The intersection genes from differential expression and co-expression analyses were input into the STRING database to construct a PPI network. Network node centrality parameters were analyzed to screen for potential key genes, followed by LASSO logistic regression (LR) for dimensionality reduction modeling to obtain candidate diagnostic genes. Nine genes demonstrated stable and excellent diagnostic performance in both training and validation sets.

Machine learning model construction and performance validation

Six classification models were built based on candidate gene expression values: RF, SVM, ANN, GBM, extreme gradient boosting (XGBoost), and LR. Performance evaluation was conducted on independent validation datasets (GSE21815 and GSE106582), with all models showing high classification accuracy in the test sets, demonstrating good generalization capability.

Immune infiltration analysis and correlation study

The CIBERSORT algorithm was used to estimate the relative proportions of 22 tumor-infiltrating immune cell types in the samples. Spearman correlation analysis between the nine screened diagnostic genes and immune cell subsets showed significant correlations (P<0.05) with multiple immune cell types, including activated/resting mast cells, M0/M1/M2 macrophages, CD8⁺ T cells, and B cells, suggesting their potential involvement in CRC development through regulation of the immune microenvironment.

Statistical analysis

All statistical analyses were performed using R software (version ≥4.2.0) and Python (version ≥3.8). Unless otherwise specified, all statistical tests were two-sided, and a P value <0.05 was considered statistically significant.

For differential expression analysis, linear modeling was conducted using the limma package to compare CRC tissues with normal tissues. Multiple hypothesis testing was adjusted using the Benjamini-Hochberg method to control the false discovery rate (FDR). Differentially expressed genes (DEGs) were defined as those with an absolute log₂ FC >1 and an adjusted P value (FDR) <0.05.

In WGCNA, gene co-expression modules were constructed using the CEMiTool package. Gene set enrichment analysis (GSEA) was applied to calculate normalized enrichment scores (NES) for each module in tumor and normal groups. Module significance was assessed using permutation testing and adjusted for multiple comparisons using FDR correction. Modules with larger absolute NES values and lower adjusted P values were considered to be strongly associated with CRC phenotypes.

PPI network analysis was performed based on the STRING database using a high-confidence interaction threshold (confidence score ≥0.98). Topological parameters, including degree centrality and betweenness centrality, were calculated to identify potential hub genes.

Feature selection for candidate diagnostic genes was conducted using LASSO LR. The optimal regularization parameter (λ) was determined via 10-fold cross-validation. Model performance was evaluated using ROC curve analysis, with the AUROC, accuracy, sensitivity, specificity, and F1-score calculated.

Multigene combination models, including RF, SVM, ANN, GBM, XGBoost, and LR, were tested on independent external validation datasets (GSE21815 and GSE106582). Differences in AUROC values between models were compared using the DeLong test. Performance improvement of multigene models over single-gene models was quantified using the difference in AUROC (ΔAUROC) to assess synergistic effects.

The relative proportions of tumor-infiltrating immune cells were estimated using the CIBERSORT algorithm. Differences in immune cell abundances between CRC and normal tissues were evaluated using the Wilcoxon rank-sum test. Associations between the expression levels of core diagnostic genes and immune cell proportions were assessed using Spearman rank correlation analysis. Correlation coefficients with an absolute value greater than 0.3 and a P value <0.05 were considered to indicate potential biological relevance. Multiple comparisons were adjusted using the FDR method.

For colonoscopy image segmentation tasks, model performance across multiple public datasets was quantitatively evaluated using seven metrics, including mean Dice coefficient (mDice), mean intersection over union (mIoU), mean absolute error (MAE), weighted F-measure (wF-measure), mean E-measure (mE-measure), maximum E-measure (maxE-measure), and S-measure (S_measure). All metrics were computed on a per-image basis in the test sets, and mean values were reported.

Statistical differences in segmentation performance between models were assessed using the paired Wilcoxon signed-rank test to evaluate the statistical superiority of edge-aware Mamba-enhanced transformer network (EMT-Net) over comparator models.

Results

Multi-algorithm collaborative screening for robust CRC feature genes

To comprehensively improve the accuracy and generalizability of CRC in early identification and auxiliary diagnosis, this study developed an integrated strategy combining molecular and imaging analyses, enabling efficient and precise intelligent recognition and discrimination of CRC from dual “molecular-imaging” perspectives. The strategy consists of two independent yet synergistic modules: a transcriptomics-based molecular biomarker screening pathway and an endoscopic image-based lesion segmentation modeling pathway (Figure 1A).

Figure 1 Schematic of dual-path CRC analysis strategy integrating molecular profiling and endoscopic imaging. (A) Framework of the multimodal AI-based CRC intelligent analysis system. The system consists of two parallel pathways: the molecular pathway (left) and the Imaging Pathway (right), which process GEO transcriptomic data and endoscopic images, respectively. The molecular pathway performs batch effect correction, DEG, and WGCNA to identify candidate and core genes, combined with functional enrichment and immune infiltration analyses to promote biomarker identification and mechanistic interpretation. The imaging pathway employs DL segmentation models (e.g., EMT-Net) to perform multi-scale precise segmentation of polyp regions, with performance evaluated using metrics such as mDice and mIoU. The two pathways achieve information fusion through a collaborative mechanism, ultimately constructing an intelligent analysis framework for clinical early screening and auxiliary diagnosis. (B) Workflow of CRC diagnostic gene screening via multi-cohort integration and machine learning. This study integrates four GEO microarray datasets (GSE10950, GSE25070, GSE74602, and GSE142279), totaling 200 samples (100 CRC cases and 100 normal controls), divided into training and validation sets at a 7:3 ratio with batch effect correction. For the training set, both differential expression analysis and WGCNA are performed to screen overlapping genes, followed by extraction of key genes based on network centrality. Subsequently, LASSO regression and ROC analysis are used to identify diagnostic genes, whose robustness is validated in independent datasets (GSE21815 and GSE106582), combined with machine learning modeling, immune infiltration analysis, and pathway enrichment analysis to ensure biological reliability and clinical potential. AI, artificial intelligence; CRC, colorectal cancer; DEG, differentially expressed gene; DL, deep learning; EMT-Net, edge-aware Mamba-enhanced transformer network; GEO, Gene Expression Omnibus; LASSO, least absolute shrinkage and selection operator; mDice, mean Dice coefficient; mIoU, mean intersection over union; ROC, receiver operating characteristic; WGCNA, weighted gene co-expression network analysis.

In the molecular analysis module, we designed a multi-cohort integration and machine learning-driven CRC diagnostic gene screening pipeline (Figure 1B). Specifically, the study integrated four balanced public microarray datasets (GSE10950, GSE25070, GSE74602, and GSE142279), comprising 200 samples (100 CRC cases and 100 normal controls). After sample integration, the data were randomly divided into training (n=140) and validation (n=60) sets at a 7:3 ratio, with matched tumor-to-normal group proportions to ensure reliable data foundations for subsequent analyses.

Batch effect correction enhances consistency of CRC sample expression data

To minimize systematic biases caused by technical variations (e.g., platform differences and experimental batches) and improve multi-cohort data consistency and comparability, we performed batch effect correction separately on the training and validation sets (Figure 2A-2D). PCA results before and after correction demonstrated significantly improved clustering patterns in the dimensionality-reduced space, with clearer inter-group differences and more concentrated intra-group distributions, indicating effective suppression of non-biological noise between batches.

Figure 2 PCA distribution plots of gene expression datasets before and after batch effect correction in training and validation sets. (A,B) PCA plots of training set (A) pre- and (B) post-batch correction, showing more concentrated data distribution and reduced inter-batch variations. (C,D) Validation set PCA plots (C) pre- and (D) post-correction, confirming improved data consistency and comparability after batch effect removal. PCA, principal component analysis.

This preprocessing step was crucial for ensuring the scientific validity of subsequent differential expression analysis and co-expression network construction. By eliminating technical background noise, we could more accurately uncover expression signals reflecting the biological essence of CRC differences, thereby establishing a solid foundation for screening high-confidence candidate diagnostic genes for CRC.

Differential expression and co-expression network analyses reveal key CRC gene modules

We performed differential gene expression analysis on the training set using thresholds of |logFC| > 1 and adjusted P value <0.05 to identify DEGs. The distribution of DEGs was visualized through a volcano plot (Figure 3A), followed by functional enrichment analyses using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (Figure 3B). KEGG pathway analysis demonstrated significant enrichment of these genes in the “cell cycle” pathway (Figure 3C).

Figure 3 Differential expression analysis and co-expression network identification revealed key CRC-related genes. (A) Volcano plot of DEGs in CRC training set (red: upregulated; blue: downregulated; gray: nonsignificant; thresholds: |logFC| >1, false discovery rate <0.05). (B) GO enrichment of DEGs covering cell cycle, mitosis, and chromosome segregation. (C) KEGG pathway analysis showing DEG enrichment in “cell cycle” and “cytokine-receptor interaction” pathways (bubble size: gene count; color: significance). (D) Five co-expression modules identified by CEMiTool, with module M1 significantly enriched in tumors. (E) Functional annotation of M1 highlighting cancer-related pathways (cell cycle/mitosis). (F) PPI network of 316 M1-DEG intersection genes (confidence =0.98; nodes: proteins; edges: functional associations), revealing key regulatory factors. BP, biological process; CC, cellular component; CRC, colorectal cancer; DEG, differentially expressed gene; FC, fold change; GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; MF, molecular function; NES, normalized enrichment score; PPI, protein-protein interaction.

For co-expression network analysis, we employed the R package CEMiTool to identify the most relevant co-expression modules between tumor and normal groups. GSEA was performed using default parameters in CEMiTool to evaluate module activity differences (Figure 3D). Enrichment scores (ES) were normalized by accounting for gene set size, yielding NES. FDRs and adjusted P values were calculated for each NES to control type I errors. This approach identified gene clusters with highly correlated expression patterns, from which we selected the most significant modules associated with malignant phenotypes using two criteria: P value and NES magnitude. Lower adjusted P values and higher absolute NES values indicated stronger module-tumor associations. Five co-expression modules were ultimately identified, with module M1 showing the strongest positive correlation with tumor samples. Pathway overrepresentation analysis revealed significant enrichment of “cell cycle/mitosis” signaling pathways in module M1 (Figure 3E). The 316 genes at the intersection of DEGs and module M1 were imported into the STRING database to construct a PPI network with a confidence threshold of 0.98 (Figure 3F).

Identification of highly discriminative CRC diagnostic genes via LASSO regression

The LASSO regression model was applied to identify candidate diagnostic genes, yielding 12 potential biomarkers (CDK4, TEAD4, MMP7, MMP3, GZMB, TRIB3, MMP1, CDC25B, CXCL10, CXCL1, UHRF1, and IQGAP3) with an optimal lambda value (λ=0.00239) obtained by minimizing the objective function. The relatively small coefficient values likely reflect the substantial number of correlated genes. Diagnostic performance accuracy evaluation of nine selected genes demonstrated excellent discrimination between CRC and normal samples: in the training set, AUROC values were 0.9790 (CDC25B), 0.9735 (CDK4), 0.9584 (CXCL1), 0.8782 (CXCL10), 0.9600 (IQGAP3), 0.9622 (MMP1), 0.9529 (MMP3), 0.9749 (TEAD4), and 0.9582 (UHRF1); while the validation set showed further improved performance with AUROC of 0.9856 (CDC25B), 0.9611 (CDK4), 0.9533 (CXCL1), 0.8622 (CXCL10), 0.9722 (IQGAP3), 0.9711 (MMP1), 0.9600 (MMP3), 0.9911 (TEAD4), and 0.9433 (UHRF1) (Figure 4).

Figure 4 Diagnostic performance of LASSO-identified candidate genes. LASSO regression (λ=0.00239) selected 12 candidate genes (CDK4, TEAD4, MMP7, MMP3, GZMB, TRIB3, MMP1, CDC25B, CXCL10, CXCL1, UHRF1, IQGAP3). Nine genes showed high diagnostic accuracy for CRC vs. normal: training set AUROCs—CDC25B: 0.9790, CDK4: 0.9735, CXCL1: 0.9584, CXCL10: 0.8782, IQGAP3: 0.9600, MMP1: 0.9622, MMP3: 0.9529, TEAD4: 0.9749, UHRF1: 0.9582; validation set AUROCs-CDC25B: 0.9856, CDK4: 0.9611, CXCL1: 0.9533, CXCL10: 0.8622, IQGAP3: 0.9722, MMP1: 0.9711, MMP3: 0.9600, TEAD4: 0.9911, UHRF1: 0.9433. The results demonstrate that these genes exhibit high accuracy and strong application potential for CRC diagnosis. AUC, area under the curve; AUROC, area under the receiver operating characteristic curve; CRC, colorectal cancer; LASSO, least absolute shrinkage and selection operator.

To more intuitively illustrate the individual diagnostic performance of each gene, we systematically summarized the AUROC values obtained from both the training and validation cohorts in Table 1 (single-gene diagnostic performance summary). Together with Figure 4, which depicts the corresponding ROC curve trends, the table provides complementary quantitative and visual evidence, thereby establishing a solid data foundation for subsequent multi-gene combinatorial analyses.

Table 1

Diagnostic efficacy of the nine core biomarkers in training and validation cohorts

Biomarker	AUROC (training)	AUROC (validation)
CDC25B	0.9790	0.9856
CDK4	0.9735	0.9611
CXCL1	0.9584	0.9533
CXCL10	0.8782	0.8622
IQGAP3	0.9600	0.9722
MMP1	0.9622	0.9711
MMP3	0.9529	0.9600
TEAD4	0.9749	0.9911
UHRF1	0.9582	0.9433

AUROC, area under the receiver operating characteristic curve.

After establishing the diagnostic performance of individual genes, we further evaluated the synergistic gain achieved by combining the nine core diagnostic genes into multi-gene models. To this end, six widely used machine learning algorithms—RF, SVM, ANN, GBM, XGBoost, and LR—were employed to construct multi-gene combinatorial models, which were systematically compared across two independent validation cohorts (GSE21815 and GSE106582). The results demonstrated that multi-gene ensemble models consistently outperformed any single-gene AUROC. Notably, the XGBoost model achieved the highest diagnostic performance in both datasets (GSE21815: AUROC =0.985; GSE106582: AUROC =0.987).

To quantitatively assess the synergistic improvement conferred by gene combination, we calculated the mean AUROC of the nine individual genes (Mean AUROC of Single Markers) and directly compared it with the multi-gene ensemble model based on XGBoost. The results showed that, in the GSE21815 and GSE106582 cohorts, the multi-gene model outperformed the mean performance of individual genes by 0.030 and 0.031, respectively (Table 2), indicating a clear complementary effect among the genes that effectively enhances the overall accuracy and robustness of CRC diagnosis. This synergistic gain further supports the clinical potential of multi-marker combinatorial strategies for early CRC screening.

Table 2

Synergistic diagnostic gain achieved by combining nine core CRC biomarkers across independent validation cohorts

Dataset	Mean AUROC of 9 single markers	AUROC of multi-marker model (XGBoost)	Synergistic gain (ΔAUROC)
GSE21815	0.955	0.985	+0.030
GSE106582	0.956	0.987	+0.031

AUROC, area under the receiver operating characteristic curve; CRC, colorectal cancer; XGBoost, extreme gradient boosting; ΔAUROC, the difference in AUROC.

Given that XGBoost consistently achieved the highest diagnostic performance across both external validation cohorts (AUROC range, 0.985–0.987), it was selected as the representative multi-gene model for synergistic gain evaluation, ensuring that our conclusions are based on optimal model performance.

Robust identification of CRC samples by core diagnostic genes across multiple machine learning models

To systematically evaluate the discriminative power of the screened CRC diagnostic genes, we implemented six mainstream machine learning models: RF, SVM, ANN, GBM, XGBoost, and LR. All models utilized the candidate gene features as input and were evaluated on two independent external validation cohorts (GSE21815 and GSE106582) using performance metrics including AUROC, accuracy, sensitivity, specificity, and F1-score.

In the GSE21815 dataset, XGBoost demonstrated optimal performance with an AUROC of 0.985, accuracy of 0.981, sensitivity of 0.964, specificity of 0.932, and F1-score of 0.976. SVM ranked second, achieving an AUROC of 0.982 and F1-score of 0.988, indicating exceptional stability and precision. Similarly, XGBoost showed superior performance in the GSE106582 cohort (AUROC =0.987, F1-score =0.942), confirming its excellent generalization capability across different data distributions.

Notably, although minor performance variations existed across datasets, RF, SVM, and XGBoost consistently outperformed conventional LR and ANN models in both accuracy and sensitivity (Table 3). These results collectively demonstrate that our nine core diagnostic genes (CDC25B, CDK4, CXCL1, CXCL10, IQGAP3, MMP1, MMP3, TEAD4, UHRF1) effectively discriminate CRC tumor samples from normal tissues across multiple machine learning frameworks, showing strong clinical diagnostic potential.

Table 3

Performance evaluation of six machine learning models for CRC diagnostic biomarkers in independent validation cohorts

ML method	Dataset	AUROC (95% CI)	Accuracy	Sensitivity	Specificity	F1-score
RF	GSE21815	0.956 (0.873–1.000)	0.978	1.000	0.666	0.988
RF	GSE106582	0.980 (0.962–0.998)	0.958	0.948	0.965	0.948
SVM	GSE21815	0.982 (0.948–1.000)	0.978	0.984	0.888	0.988
SVM	GSE106582	0.980 (0.962–0.997)	0.932	0.857	0.982	0.910
ANN	GSE21815	0.970 (0.920–1.000)	0.943	0.946	0.888	0.968
ANN	GSE106582	0.975 (0.956–0.995)	0.860	0.662	0.991	0.790
GBM	GSE21815	0.967 (0.909–1.000)	0.907	0.909	0.888	0.948
GBM	GSE106582	0.976 (0.951–1.000)	0.927	0.831	0.991	0.901
XGBoost	GSE21815	0.985 (0.950–1.000)	0.981	0.964	0.932	0.976
XGBoost	GSE106582	0.987 (0.968–0.998)	0.948	0.901	0.969	0.942
Logistic regression	GSE21815	0.938 (0.874–0.990)	0.905	0.892	0.870	0.900
Logistic regression	GSE106582	0.951 (0.928–0.985)	0.910	0.885	0.933	0.898

ANN, artificial neural network; AUROC, area under the receiver operating characteristic curve; CI, confidence interval; CRC, colorectal cancer; GBM, gradient boosting machine; ML, machine learning; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.

Association between CRC diagnostic genes and immune cell infiltration reveals potential immunoregulatory mechanisms

We analyzed the phenotypes of 22 immune cell types across 200 samples from 4 microarray datasets to investigate their relationships with CRC and normal groups. The correlation matrix revealed the strongest positive correlation between “activated mast cells” and “neutrophils”, and the strongest negative correlation between “activated mast cells” and “M2 macrophages” (Figure 5A). Notably, M2 macrophages, resting CD4⁺ memory T cells, plasma cells, memory B cells, naive B cells, regulatory T cells (Tregs), eosinophils, and resting mast cells were the most downregulated cell types in CRC. Conversely, activated mast cells, CD8⁺ T cells, activated CD4⁺ memory T cells, follicular helper T cells, activated dendritic cells, neutrophils, M0 macrophages, and M1 macrophages showed the most significant upregulation (P<0.05; Figure 5B). Correlation analysis between the 9 diagnostic genes and 22 immune cell subsets helped elucidate their potential roles in CRC pathogenesis through immunoregulation. Six genes (CDC25B, CDK4, IQGAP3, MMP1, TEAD4, and UHRF1) showed positive correlations with activated mast cells, neutrophils, activated CD4⁺ memory T cells, follicular helper T cells, M0 macrophages, and M1 macrophages, while exhibiting negative correlations with resting mast cells, M2 macrophages, plasma cells, memory B cells, CD8⁺ T cells, and resting CD4⁺ memory T cells (P<0.05; Figure 5C).

Figure 5 Immune cell infiltration characteristics vs. diagnostic genes. (A) Heatmap of correlations among 22 tumor-infiltrating immune cell types in CRC tissues, based on Spearman correlation analysis of 200 samples (from 4 GEO microarray datasets). Color intensity indicates the strength and direction of immune cell correlations. Analysis revealed the strongest positive correlation between activated mast cells and neutrophils, and the strongest negative correlation with M2 macrophages, suggesting their potential interactions in CRC immunoregulation. (B) Violin plots show significant downregulation in CRC for M2 macrophages, resting CD4⁺ memory T cells, plasma cells, memory/naive B cells, Tregs, eosinophils, and resting mast cells; upregulation for activated mast cells, CD8⁺ T cells, activated CD4⁺ memory T cells, follicular helper T cells, activated dendritic cells, neutrophils, and M0/M1 macrophages (P<0.05). (C) Bubble plot of Spearman correlations between six diagnostic genes (CDC25B, CDK4, IQGAP3, MMP1, TEAD4, UHRF1) and immune subsets (bubble size: correlation strength; red: P<0.05). Results showed these genes were positively correlated with activated immune cells (e.g., activated mast cells, neutrophils, activated CD4⁺ memory T cells), while negatively correlated with immunosuppressive or resting cell types (e.g., M2 macrophages, CD8⁺ T cells, plasma cells), suggesting their potential involvement in CRC progression through immune microenvironment regulation. CRC, colorectal cancer; GEO, Gene Expression Omnibus; NK, natural killer.

EMT-Net achieves precise polyp lesion segmentation in CRC imaging

CRC is a common malignant tumor of the digestive tract, with most cases arising from polyp lesions. Timely polyp detection and removal can significantly reduce CRC incidence. Accurate polyp segmentation provides essential diagnostic information for early CRC detection and treatment. However, polyps of the same type may vary in texture, color, and size, with some exhibiting similar coloration to surrounding healthy tissue, resulting in poorly defined boundaries. To address these challenges in polyp localization and boundary delineation, we propose an EMT-Net (Figure 6A). The model employs edge-aware Mamba-enhanced transformer (EMFormer) for robust feature extraction and introduces a feature fusion module (FFM) to refine encoder feature processing by aggregating multi-scale features across encoder layers, thereby improving polyp localization accuracy.

Figure 6 EMT-Net achieves accurate polyp lesion segmentation in CRC images. (A) Overview of EMT-Net architecture. Given most CRC cases originate from polyp lesions with variable morphology/color and blurred boundaries, we propose an EMT-Net. The model incorporates EMFormer for robust feature extraction and an FFM for multi-scale feature aggregation, enhancing polyp localization and edge detection. (B) EMFormer architecture: (I) the backbone alternates between Edge-aware Token Mixing modules (ESO/EIA) and Mamba state-space modeling blocks for joint local edge extraction and global context modeling; (II) ESO integrates Sobel operators with multi-scale kernels to capture directional gradients; (III) EIA employs Laplacian operators for high-order edge detection; (IV) Mamba blocks establish long-range dependencies. This design synergistically combines edge guidance with sequence modeling. (C) FFM structure: FFM fuses deep semantic and shallow detail features via 1×1 conv (channel compression), with parallel processing paths (direct/upsampled branches for deep features; direct/downsampled branches for shallow features), ultimately generating enhanced boundary-aware feature maps. BN, batch normalization; CRC, colorectal cancer; EMT-Net, edge-aware Mamba-enhanced transformer network; EMFormer, edge-aware Mamba-enhanced transformer; FFM, feature fusion module; LN, layer normalization; MLP, multilayer perceptron; SSM, state space model; UP, upsampling.

Key model components enhance boundary modeling for complex CRC polyps

To address the limitations of existing polyp segmentation models in edge detail modeling, we propose a novel Transformer architecture, the EMFormer. This architecture integrates classical edge operators such as Sobel and Laplacian operators by incorporating these operators at different Token Mixing stages, explicitly enhancing the model’s perception capability of polyp edges and low-level textures. To better capture polyp details and boundaries, EMFormer embeds Mamba modules between its backbone structures, leveraging their efficient state space modeling capability to further capture long-range dependencies. This architecture not only maintains the global modeling advantages of Transformers but also deeply integrates local edge information of polyp regions, significantly improving the accuracy of polyp segmentation and localization. The design philosophy of EMFormer lies in the deep integration of edge awareness with sequence modeling, providing an innovative solution for polyp segmentation tasks (Figure 6B).

The FFM processes dual feature maps from different network depths: low-resolution deep features with rich semantics and high-resolution shallow features with detailed textures. Both feature streams first undergo channel compression and nonlinear mapping through a 1×1 convolution + BN + ReLU (“1Conv”). For the shallow features, one path directly passes through three 3×3 convolutions + BN + ReLU (“3Conv”) to generate detail branch outputs, while the other path first undergoes average pooling (“AvgPool”) downsampling before “3Conv” processing. For the deep features, one path directly processes through “3Conv” to obtain semantic branch outputs, while the other path first performs bilinear interpolation upsampling (“UP”) followed by “3Conv”. The direct branch output of shallow features {3Conv[CFP(F₃)]} is element-wise added to the upsampled deep feature path to produce high-resolution fused features. Similarly, the direct branch output of deep features is added to the downsampled shallow feature path to generate low-resolution fused features. The low-resolution fused features are then upsampled (“UP”) and processed through “3Conv” to create a detail restoration branch, which is finally element-wise added to the high-level fused features to output the ultimate feature map (Figure 6C).

EMT-net achieves optimal performance for CRC polyp segmentation across multiple datasets

A large number of experiments were conducted using the EMT-Net model on four datasets, and segmentation results were obtained on multiple datasets. Meanwhile, we compared EMT-Net with a series of current medical image segmentation models, qualitatively and quantitatively analyzing the model’s generalization and learning capabilities. The effectiveness of the model and the accuracy of its segmentation performance were demonstrated from multiple perspectives. Our evaluation employed seven key metrics: mDice, mIoU, MAE, wF_measure ( $F_{β}^{ω}$ ), mE_measure (mE_ξ), maxE_measure (maxE_ξ), and S_measure (S_α), with qualitative experiments conducted for each model using these seven metrics respectively. The models include EMT-Net, U-Net++ (28), SFA (29), PraNet (30), ACSNet (31), CCBANet (32), SANet (33), SegFormer (34), and Polyp-PVT (35). To ensure fair comparison, we conducted qualitative experiments for each model. The visualization results of each model are presented, and several representative cases demonstrate that the CIFG-Net model effectively addresses the issues of unclear polyp segmentation boundaries and inaccurate localization. We conducted tests on four datasets—CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and ETIS—to evaluate the model’s learning capability. Tables 4-7 summarize the experimental results of the seven metrics across these four datasets. The proposed model achieved optimal performance in all tests on these datasets.

Table 4

Polyp segmentation performance metrics of comparative models on CVC-ClinicDB dataset

Model	mDice	mIoU	$F_{β}^{ω}$	S_α	mE_ξ	maxE_ξ	MAE
U-Net++	0.886	0.830	0.881	0.921	0.953	0.957	0.013
SFA	0.700	0.607	0.647	0.793	0.840	0.885	0.042
PraNet	0.899	0.849	0.896	0.936	0.963	0.979	0.009
ACSNet	0.883	0.828	0.878	0.918	0.946	0.948	0.014
CCBANet	0.904	0.849	0.900	0.926	0.957	0.960	0.016
SANet	0.916	0.859	0.909	0.939	0.971	0.976	0.012
SegFormer	0.931	0.882	0.927	0.948	0.981	0.987	0.008
Polyp-PVT	0.937	0.889	0.936	0.949	0.985	0.989	0.006
Ours	0.951	0.898	0.949	0.967	0.986	0.992	0.005

CVC-ClinicDB, Computer Vision Center-Clinic Database; MAE, mean absolute error; mDice, mean Dice coefficient; mIoU, mean intersection over union.

Table 5

Polyp segmentation performance metrics of comparative models on Kvasir-SEG dataset

Model	mDice	mIoU	$F_{β}^{ω}$	S_α	mE_ξ	maxE_ξ	MAE
U-Net++	0.821	0.738	0.797	0.856	0.900	0.904	0.048
SFA	0.723	0.611	0.670	0.782	0.834	0.849	0.075
PraNet	0.898	0.840	0.885	0.915	0.944	0.948	0.030
ACSNet	0.884	0.820	0.870	0.894	0.934	0.937	0.035
CCBANet	0.894	0.834	0.884	0.905	0.941	0.944	0.030
SANet	0.904	0.847	0.892	0.915	0.949	0.953	0.028
SegFormer	0.915	0.864	0.903	0.924	0.952	0.957	0.026
Polyp-PVT	0.917	0.864	0.911	0.925	0.956	0.962	0.023
Ours	0.929	0.883	0.926	0.942	0.971	0.976	0.019

Kvasir-SEG, Kvasir Polyp Segmentation Dataset; MAE, mean absolute error; mDice, mean Dice coefficient; mIoU, mean intersection over union.

Table 6

Polyp segmentation performance metrics of comparative models on CVC-ColonDB dataset

Model	mDice	mIoU	$F_{β}^{ω}$	S_α	mE_ξ	maxE_ξ	MAE
U-Net++	0.618	0.538	0.602	0.764	0.790	0.838	0.046
SFA	0.469	0.347	0.379	0.634	0.675	0.764	0.094
PraNet	0.712	0.640	0.699	0.820	0.847	0.872	0.043
ACSNet	0.733	0.650	0.713	0.813	0.861	0.864	0.044
CCBANet	0.761	0.674	0.742	0.825	0.882	0.884	0.041
SANet	0.753	0.670	0.726	0.837	0.869	0.878	0.043
SegFormer	0.811	0.736	0.794	0.875	0.912	0.921	0.031
Polyp-PVT	0.808	0.727	0.795	0.865	0.913	0.919	0.031
Ours	0.823	0.745	0.803	0.872	0.920	0.924	0.029

CVC-ColonDB, Computer Vision Center Colon Database; MAE, mean absolute error; mDice, mean Dice coefficient; mIoU, mean intersection over union.

Table 7

Polyp segmentation performance evaluation metrics of different models on the ETIS dataset

Model	mDice	mIoU	$F_{β}^{ω}$	S_α	mE_ξ	maxE_ξ	MAE
U-Net++	0.418	0.357	0.357	0.682	0.635	0.735	0.027
SFA	0.297	0.217	0.231	0.557	0.531	0.632	0.109
PraNet	0.628	0.567	0.600	0.794	0.808	0.841	0.031
ACSNet	0.710	0.626	0.667	0.820	0.865	0.867	0.019
CCBANet	0.685	0.605	0.637	0.795	0.795	0.812	0.038
SANet	0.750	0.654	0.685	0.849	0.881	0.897	0.015
SegFormer	0.783	0.704	0.740	0.871	0.899	0.910	0.015
Polyp-PVT	0.787	0.706	0.750	0.871	0.906	0.910	0.013
Ours	0.812	0.733	0.781	0.889	0.913	0.915	0.012

ETIS, ETIS-Larib Polyp Database; MAE, mean absolute error; mDice, mean Dice coefficient; mIoU, mean intersection over union.

To comprehensively evaluate the robustness and effectiveness of our proposed EMT-Net model for colorectal polyp segmentation tasks under complex conditions, we conducted systematic quantitative and qualitative comparisons across four public datasets: CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and ETIS. Figure 7A presents bar charts comparing EMT-Net’s performance against eight benchmark models across all seven metrics, showing EMT-Net’s superior performance. Figure 7B further illustrates the model’s balanced capabilities across different dimensions through radar charts, providing more intuitive validation of its segmentation performance on real clinical images.

Figure 7 Comprehensive EMT-Net evaluation across multiple CRC polyp datasets. (A) Bar plots comparing EMT-Net against eight models (U-Net++, SFA, PraNet, ACSNet, CCBANet, SANet, SegFormer, Polyp-PVT) on seven metrics across four datasets (CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, ETIS), the comparative results across seven evaluation metrics demonstrate that EMT-Net achieves superior performance on all datasets and metrics, exhibiting exceptional segmentation capability and model stability. (B) The radar chart based on seven performance metrics comprehensively compares model performance across four datasets. EMT-Net demonstrates balanced excellence in multiple evaluation dimensions, including boundary awareness, structural preservation, and error control, fully showcasing its strong cross-scenario and cross-data generalization capability and robustness. These results validate its potential applicability as an automated CRC polyp segmentation tool for real-world clinical images. CRC, colorectal cancer; CVC-ClinicDB, Computer Vision Center-Clinic Database; CVC-ColonDB, Computer Vision Center Colon Database; EMT-Net, edge-aware Mamba-enhanced transformer network; ETIS, ETIS-Larib Polyp Database; Kvasir-SEG, Kvasir Polyp Segmentation Dataset.

Visual comparative validation demonstrates EMT-Net’s clinical potential for CRC imaging

Figure 8 presents comparative visualizations of segmentation results between EMT-Net and mainstream models (PraNet, U-Net++, SFA) on representative samples from four public polyp datasets (CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, ETIS). From left to right: original images (Img), ground truth masks (GT), our method (Ours), and three comparison methods. Results demonstrate EMT-Net’s superior boundary perception and structural preservation across diverse complex scenarios, with particular advantages in: (I) recovering continuous lesion boundaries in blurred regions (e.g., illumination-variant polyps in CVC-ClinicDB), EMT-Net effectively restores continuous, complete lesion contours; (II) detecting multiple micro-polyps (e.g., scattered lesions in Kvasir-SEG), EMT-Net achieves simultaneous high detection sensitivity and precise boundary delineation; (III) accurately identifying targets under low-contrast backgrounds or occlusions (e.g., background-similar polyps in ETIS). These comprehensive results validate EMT-Net’s exceptional generalization capability and robustness in handling varied clinical imaging inputs, establishing a reliable segmentation foundation for computer-aided CRC screening

Figure 8 Segmentation visual comparisons across four datasets. The figure demonstrates a comparative analysis of segmentation performance between EMT-Net and state-of-the-art models across four benchmark datasets (top to bottom: CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, ETIS; two representative cases per dataset; left to right: Img, GT, Ours, PraNet, U-Net++, SFA). EMT-Net shows marked advantages in handling blurred boundaries, multi-scale structures, and low-contrast regions, producing more complete morphologies and sharper edges than competitors, validating its superior visual perception and clinical potential. CVC-ClinicDB, Computer Vision Center-Clinic Database; CVC-ColonDB, Computer Vision Center Colon Database; EMT-Net, edge-aware Mamba-enhanced transformer network; ETIS, ETIS-Larib Polyp Database; GT, ground truth; Kvasir-SEG, Kvasir Polyp Segmentation Dataset.

Discussion

CRC represents a significant global public health challenge with persistently increasing incidence and high mortality rates (2,36), where early detection and precise intervention constitute key strategies for mortality reduction (37,38). This study focuses on early CRC identification and auxiliary diagnosis, pioneering a parallel yet synergistic dual-path intelligent recognition strategy that independently models molecular biomarkers and imaging structural features while balancing biological mechanism elucidation with clinical implementation, thereby establishing a comprehensive technical framework with high originality, technical advancement, and clinical generalizability.

At the molecular level, leveraging multi-omics data from public functional genomics databases (GEO), we constructed a diagnostic biomarker screening system encompassing four training datasets and two independent external validation cohorts. Through integrated batch effect correction, differential expression analysis, WGCNA, and PPI network analysis, we ultimately identified nine core genes (e.g., CDC25B, TEAD4, MMP1, UHRF1) demonstrating stable expression across multiple CRC datasets with functional relevance. Systematic evaluation using six machine learning methods (RF, SVM, XGBoost, GBM, ANN, LR) confirmed exceptional classification performance (maximum AUROC =0.987) in both GSE21815 and GSE106582 validation sets, exhibiting robust reproducibility and cross-dataset generalizability. These results provide reliable algorithmic support for CRC molecular diagnosis while offering theoretical foundations and candidate molecules for pathogenesis research (Figure 9).

Figure 9 AI-assisted diagnostic strategy for CRC via dual-path integration of molecular identification and image segmentation. AI, artificial intelligence; ANN, artificial neural network; CRC, colorectal cancer; RF, random forest.

For imaging analysis, we developed the EMT-Net model incorporating state-space modeling and edge-guided mechanisms, combining Transformer’s global modeling capabilities with Mamba’s advantages in long-range dependency modeling to structurally address common polyp recognition challenges including boundary blurring, multi-scale variations, and complex background interference. The introduced edge enhancement modules and semantic fusion strategies significantly improved segmentation boundary precision and structural integrity. Comprehensive validation across four public colonoscopy image datasets (CVC-ClinicDB, Kvasir-SEG, ETIS, CVC-ColonDB) demonstrated EMT-Net’s state-of-the-art performance across seven key metrics (mDice, mIoU, $F_{β}^{ω}$ , S_α, mE_ξ), maxE_ξ, MAE), confirming its high adaptability and broad generalizability across heterogeneous imaging conditions. Visualization experiments further verified EMT-Net’s stable recognition performance for severely blurred boundaries or low-contrast images, providing solid foundations for clinical AI diagnostics in complex scenarios.

Notably, this study adopts a parallel modeling strategy in which molecular features and imaging structures are independently modeled, rather than forcibly fused at the feature level. This design preserves the analytical depth of each pathway while enabling complementary information to emerge at the result level: molecular features provide mechanistic and pathological interpretive cues, whereas imaging features offer spatial localization and authentic tissue-context support. Together, these pathways mutually reinforce the diagnostic rationale and establish a foundation for constructing an actionable intelligent decision system.

Although the two pathways operate independently during model construction, the “hybrid” nature of our approach does not lie in rigid feature- or architecture-level integration, but rather in deep fusion at the interpretation and decision levels. The molecular pathway delivers functional risk signals associated with disease initiation and progression, while the imaging pathway supplies morphological and spatial lesion information. These pathways form semantically complementary and mutually verifiable sources of evidence. At the model output stage, pathway-specific risks are integrated through an AI-based decision controller, enabling final multimodal combination and synergistic decision-making. This decision-level fusion strategy avoids the noise accumulation commonly associated with direct feature concatenation, better aligns with the clinical logic of integrating functional and structural evidence, and constitutes a central innovation of the proposed hybrid model.

Although several studies have attempted to incorporate multimodal information into assisted CRC diagnosis, most existing “hybrid” models still exhibit substantial limitations. First, current approaches predominantly rely on shallow feature concatenation or simple weighting strategies for information integration, lacking deep synergistic modeling mechanisms between functional and structural layers; consequently, the complementary value of multimodal data is not fully exploited (18). Second, the molecular biomarkers employed in many studies are often not selected through systematic biological screening pipelines or supported by cross-cohort validation, resulting in limited model interpretability and reproducibility (21,22). Third, many models are trained and evaluated on a single imaging dataset, with insufficient generalizability across platforms, devices, and populations. More critically, most existing hybrid models fail to establish executable decision-level integration frameworks that can translate multimodal recognition outputs into explicit risk stratification and clinical management recommendations, thereby constraining their applicability in real-world clinical settings (19,20).

In contrast, the dual-pathway strategy proposed in this study—integrating molecular biomarkers with image segmentation—achieves genuine complementary enhancement across functional and structural domains. The molecular pathway captures early molecular aberrations associated with disease initiation from a mechanistic perspective, offering high sensitivity and strong interpretability, while the imaging pathway leverages EMT-Net to extract lesion morphology and boundary information, thereby providing structural and spatial localization support. The synergistic integration of these pathways effectively reduces the risks of false-negative and false-positive outcomes associated with single-modality approaches and improves robustness under heterogeneous data conditions. Furthermore, an AI-based decision controller enables decision-level fusion by mapping molecular risk scores (P_mol) and imaging risk scores (P_img) into four clinically meaningful management pathways, thereby addressing a critical gap in existing hybrid models that lack executable decision chains. This multi-level synergistic, highly interpretable, and clinically deployable design constitutes the core innovation of the present study.

At present, the imaging analysis module is primarily focused on polyp segmentation, serving as a structural indicator of early carcinogenic risk rather than providing direct image-based diagnosis of pathologically confirmed early CRC. Owing to the limited availability of large-scale, pathology-annotated early cancer imaging datasets, image-based classification was not performed in this study. Future work will incorporate pathology-grounded gold-standard datasets to extend the current framework from polyp segmentation toward early carcinogenesis classification and predictive modeling.

Building upon the dual-pathway synergy described above, we further developed an AI-driven decision controller that can be seamlessly embedded into clinical workflows. Specifically, the molecular pathway generates an individualized molecular risk score (P_mol) based on nine core diagnostic genes, while the imaging pathway produces polyp segmentation outputs and an imaging-based risk score (P_img) using EMT-Net. Integration of these two risk scores enables the definition of four clinically meaningful management scenarios: (I) dual-positive (high P_mol + high P_img), in which patients are classified as high-risk and should be rapidly referred for intensified diagnostic and therapeutic procedures, such as targeted biopsy, supplementary imaging assessment, or endoscopic intervention; (II) imaging-positive only (high P_img, low P_mol), suggesting structural suspicion without overt molecular abnormalities, for which short-term surveillance or additional molecular/pathological testing is recommended; (III) molecular-positive only (high P_mol, low P_img), potentially representing an early prodromal stage in which molecular alterations precede detectable imaging changes, warranting shortened follow-up intervals and additional blood- or tissue-based molecular assessments; and (IV) dual-negative (low P_mol + low P_img), corresponding to a low-risk population suitable for routine screening or active monitoring.

Through this rule-based mapping from “positive/non-positive” screening outcomes to predefined management pathways, the proposed system extends beyond a conventional pattern-recognition model to function as an executable AI-based triage controller. By automatically routing screening results into appropriate clinical management trajectories, this framework enhances the systematic organization, interpretability, and operational feasibility of CRC early-screening workflows.

Notably, rather than forcibly integrating molecular and imaging data dimensions, our study adopted a parallel architecture maintaining independent modeling while achieving synergistic empowerment through complementary information: molecularly identified core genes provide potential explanatory annotations and regional guidance for imaging models, while structurally recognized regions offer functional context for biomarker localization. This “parallel-yet-complementary” technical approach ensures that the entire recognition system maintains outstanding stability, adaptability, and scalability when handling complex heterogeneous data, establishing a pathway template for future genuine multimodal intelligent diagnostic systems. Although EMT-Net demonstrated strong segmentation performance across multiple publicly available datasets, several limitations should be acknowledged. First, the model has not yet been validated in prospective real-world clinical settings. Its robustness under practical colonoscopy conditions, such as illumination variability, motion artifacts, lens contamination, inflammatory responses, or the presence of concomitant lesions, remains to be further evaluated. Second, the model’s adaptability to highly heterogeneous polyp types and morphologies, including flat lesions, serrated polyps, and inflammatory pseudopolyps, has not been systematically assessed. Future studies will address these limitations through prospective, multicenter validation using more complex and clinically representative datasets to further enhance the model’s clinical applicability.

Conclusions

The study demonstrates strong clinical utility across methodological construction, performance validation, and implementation feasibility. The proposed molecular biomarker combination can serve as an auxiliary screening tool and be readily extended to multi-gene detection platforms, such as blood-based assays or tissue microarrays. In parallel, the imaging pathway based on EMT-Net can be modularly integrated into existing colonoscopy workstations to enable automated polyp detection, risk alerting, and intra-procedural navigation. Building upon this dual-pathway architecture, the proposed system offers a clear translational trajectory: standardized molecular assays can support scalable diagnostic panels at the molecular level, while the imaging module can be deployed within real-world endoscopic workflows to provide real-time decision support. Together, these components constitute a “molecular-structural” dual-engine framework for intelligent CRC screening, with substantial translational potential and technical scalability.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-687/rc

Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-687/prf

Funding: None.

Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-687/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Morgan E, Arnold M, Gini A, et al. Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from GLOBOCAN. Gut 2023;72:338-44. [Crossref] [PubMed]
Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
Siegel RL, Wagle NS, Cercek A, et al. Colorectal cancer statistics, 2023. CA Cancer J Clin 2023;73:233-54. [Crossref] [PubMed]
Zinkeng A, Taylor FL, Cheong SH, et al. Early Onset Colorectal Cancer: Molecular Underpinnings Accelerating Occurrence. Cell Mol Gastroenterol Hepatol 2025;19:101425. [Crossref] [PubMed]
Chan DKH, Buczacki SJA. Tumour heterogeneity and evolutionary dynamics in colorectal cancer. Oncogenesis 2021;10:53. [Crossref] [PubMed]
Zauber AG, Winawer SJ, O'Brien MJ, et al. Colonoscopic polypectomy and long-term prevention of colorectal-cancer deaths. N Engl J Med 2012;366:687-96. [Crossref] [PubMed]
Eckmann JD, Ebner DW, Kisiel JB. Multi-Target Stool DNA Testing for Colorectal Cancer Screening: Emerging Learning on Real-world Performance. Curr Treat Options Gastroenterol 2020;18:109-19. [Crossref] [PubMed]
Repici A, Badalamenti M, Maselli R, et al. Efficacy of Real-Time Computer-Aided Detection of Colorectal Neoplasia in a Randomized Trial. Gastroenterology 2020;159:512-520.e7. [Crossref] [PubMed]
Ray-Offor E, Abdulkareem FB, Jebbin NJ. Pit Pattern Analysis of Colorectal Polyps using Storz Professional Image Enhancement System (SPIES) Endoscopy: A Pilot Study. J West Afr Coll Surg 2022;12:17-22. [Crossref] [PubMed]
Cohen J. Objective longitudinal performance measurement using the Mayo Colonoscopy Skills Assessment Tool: a step in the right direction. Gastrointest Endosc 2010;72:1134-7. [Crossref] [PubMed]
Hamzi MA, Alayoud A, Asseraji M, et al. Porphyria cutanea tarda in a hemodialysis patient with hepatitis C virus: efficacy of treatment with multiple phlebotomies and erythropoietin. Saudi J Kidney Dis Transpl 2013;24:121-3. [Crossref] [PubMed]
Keaney JF Jr, Weaver DR. Vascular rhythms and adaptation: do your arteries know what time it is? Circulation 2009;119:1463-6. [Crossref] [PubMed]
Juthani R, Madajewski B, Yoo B, et al. Ultrasmall Core-Shell Silica Nanoparticles for Precision Drug Delivery in a High-Grade Malignant Brain Tumor Model. Clin Cancer Res 2020;26:147-58. [Crossref] [PubMed]
Carethers JM. Fecal DNA Testing for Colorectal Cancer Screening. Annu Rev Med 2020;71:59-69. [Crossref] [PubMed]
Mostafa M, Eltaher B, Egiza HA, et al. Use of stool DNA for colorectal cancer screening: a meta-analysis and systematic review. Eur J Cancer Prev 2025;34:309-15. [Crossref] [PubMed]
Uribe M, Aquino-Matus J. Hepatology and the "new reality" ushered in by the COVID-19 pandemic. Ann Hepatol 2020;19:449-50. [Crossref] [PubMed]
Baumgartner S, Agrawal D, Greenberg L. The Enhanced Brief Structured Observation Model: Efficiently Assess Trainee Competence and Provide Feedback. MedEdPORTAL 2021;17:11153. [Crossref] [PubMed]
Degnan AJ, Luchansky JB. Influence of Beef Tallow and Muscle on the Antilisterial Activity of Pediocin AcH and Liposome-Encapsulated Pediocin AcH. J Food Prot 1992;55:552-4. [Crossref] [PubMed]
Norris K, Hillmen P, Rawstron A, et al. Telomere length predicts for outcome to FCR chemotherapy in CLL. Leukemia 2019;33:1953-63. [Crossref] [PubMed]
Estevez-Fraga C, Scahill R, Rees G, et al. Diffusion imaging in Huntington's disease: comprehensive review. J Neurol Neurosurg Psychiatry 2020;92:62-9. [Crossref] [PubMed]
Sánchez-Rivera FJ, Jacks T. Applications of the CRISPR-Cas9 system in cancer biology. Nat Rev Cancer 2015;15:387-95. [Crossref] [PubMed]
Kacher JE. Orofacial Granulomatosis. Tex Dent J 2015;132:428-7.
Glissen Brown JR, Mansour NM, Wang P, et al. Deep Learning Computer-aided Polyp Detection Reduces Adenoma Miss Rate: A United States Multi-center Randomized Tandem Colonoscopy Study (CADeT-CS Trial). Clin Gastroenterol Hepatol 2022;20:1499-1507.e4. [Crossref] [PubMed]
Zhu S, Li Y, Dai X, et al. A Multi-Resolution Hybrid CNN-Transformer Network With Scale-Guided Attention for Medical Image Segmentation. IEEE J Biomed Health Inform 2025;29:8385-94. [Crossref] [PubMed]
Xu H, Tang RSY, Lam TYT, et al. Artificial Intelligence-Assisted Colonoscopy for Colorectal Cancer Screening: A Multicenter Randomized Controlled Trial. Clin Gastroenterol Hepatol 2023;21:337-346.e3. [Crossref] [PubMed]
Wang W, Kandimalla R, Huang H, et al. Molecular subtyping of colorectal cancer: Recent progress, new challenges and emerging opportunities. Semin Cancer Biol 2019;55:37-52. [Crossref] [PubMed]
Zheng X, Song J, Yu C, et al. Single-cell transcriptomic profiling unravels the adenoma-initiation role of protein tyrosine kinases during colorectal tumorigenesis. Signal Transduct Target Ther 2022;7:60. [Crossref] [PubMed]
Zhou Z, Siddiquee MMR, Tajbakhsh N, et al. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018) 2018;11045:3-11. [Crossref] [PubMed]
Fang Y, Chen C, Yuan Y, et al. Selective feature aggregation network with area-boundary constraints for polyp segmentation. Med Image Comput Comput Assist Interv 2019;11764:302-10.
Fan DP, Ji GP, Zhou T, et al. PraNet: parallel reverse attention network for polyp segmentation. Med Image Comput Comput Assist Interv 2020;12263:263-73.
Zhang R, Li G, Li Z, et al. Adaptive context selection for polyp segmentation. Med Image Comput Comput Assist Interv 2020;12263:253-62.
Nguyen TC, Nguyen TP, Diep GH, et al. CCBANet: cascading context and balancing attention for polyp segmentation. Med Image Comput Comput Assist Interv 2021;12902:633-43.
Lee GE, Cho J, Choi SI. Shallow and reverse attention network for colon polyp segmentation. Sci Rep 2023;13:15243. [Crossref] [PubMed]
Xie E, Wang W, Yu Z, et al. SegFormer: simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 2021;34:12077-90.
Dong B, Wang W, Fan D-P, et al. Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers. CAAI Artificial Intelligence Research 2023;2:9150015.
Zhang T, Guo Y, Qiu B, et al. Global, regional, and national trends in colorectal cancer burden from 1990 to 2021 and projections to 2040. Front Oncol 2024;14:1466159. [Crossref] [PubMed]
Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
Tonini V, Zanni M. Why is early detection of colon cancer still not possible in 2023? World J Gastroenterol 2024;30:211-24. [Crossref] [PubMed]

Cite this article as: Zhao Y, Zeng W. A hybrid molecular-imaging model for high-accuracy early colorectal cancer diagnosis. J Gastrointest Oncol 2026;17(1):9. doi: 10.21037/jgo-2025-687

A hybrid molecular-imaging model for high-accuracy early colorectal cancer diagnosis

Highlight box

Introduction

Methods

Transcriptomic data acquisition and preprocessing

Construction of public colonoscopy image datasets and task design

Batch effect correction

Differential expression analysis and co-expression network construction

PPI network construction and diagnostic gene screening

Machine learning model construction and performance validation

Immune infiltration analysis and correlation study

Statistical analysis

Results

Multi-algorithm collaborative screening for robust CRC feature genes

Batch effect correction enhances consistency of CRC sample expression data

Differential expression and co-expression network analyses reveal key CRC gene modules

Identification of highly discriminative CRC diagnostic genes via LASSO regression

Table 1

Table 2

Robust identification of CRC samples by core diagnostic genes across multiple machine learning models

Table 3

Association between CRC diagnostic genes and immune cell infiltration reveals potential immunoregulatory mechanisms

EMT-Net achieves precise polyp lesion segmentation in CRC imaging

Key model components enhance boundary modeling for complex CRC polyps

EMT-net achieves optimal performance for CRC polyp segmentation across multiple datasets

Table 4

Table 5

Table 6

Table 7

Visual comparative validation demonstrates EMT-Net’s clinical potential for CRC imaging

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share