Study on ensemble model with weight allocation based on improved dung beetle optimization algorithm for screening colorectal cancer using laboratory test indicators
Highlight box
Key findings
• We propose an improved sine algorithm-guided dung beetle optimizer with weighted voting (MSADBO-WV) to enhance the accuracy of colorectal cancer (CRC) screening.
What is known and what is new?
• Invasive methods like colonoscopy, while accurate, are resource-intensive and less accessible. Machine learning models have shown promise in early cancer detection, but traditional single classifiers and ensemble methods often struggle with high-dimensional data and overfitting.
• Our framework leverages routine laboratory test indicators to achieve high accuracy (98.42%±1.53%) in early CRC screening, outperforming conventional machine learning and ensemble methods. The study also identifies an optimal subset of 26 biomarkers, including carcinoembryonic antigen and platelet distribution width, which significantly differentiate CRC patients from healthy controls.
What is the implication, and what should change now?
• The MSADBO-WV framework offers a cost-effective, non-invasive alternative for early CRC screening, potentially increasing accessibility and adherence to screening protocols. Healthcare systems should consider integrating such artificial intelligence-driven tools into preliminary screening programs to reduce reliance on invasive procedures. Future research should focus on validating this method in diverse populations and optimizing its computational efficiency for large-scale clinical deployment. Policymakers and clinicians must prioritize the adoption of advanced machine learning techniques to bridge gaps in early cancer detection and improve patient outcomes.
Introduction
Colorectal cancer (CRC) is ranked as the third most common malignant tumor worldwide (1-3). Although surgery and chemotherapy remain the primary treatment approaches for CRC patients, these treatments pose significant challenges and are associated with side effects and high recurrence rates. The prognosis for patients with advanced CRC is generally poor (4). Consequently, early treatment has been emphasized as critical for improving patient outcomes, as it significantly enhances survival rates. Efforts by public health organizations have been focused on promoting the importance of CRC screening to disseminate relevant knowledge (5-7). However, due to the typically asymptomatic nature of early-stage CRC, the disease is often diagnosed at an advanced stage. Furthermore, the pathogenesis of CRC remains insufficiently understood, making it a particularly challenging condition to address. Nevertheless, compared to other cancers, certain advantages in terms of prevention are offered by CRC. Given the absence of noticeable symptoms during its early development, the development of effective early detection techniques is crucial.
Currently, multiple techniques and methods are available for early CRC diagnosis, categorized broadly into non-invasive and invasive approaches (8-10). Non-invasive methods include fecal occult blood testing (FOBT) (11), fecal immunochemical testing (FIT) (12), and stool DNA testing (13). FOBT and FIT are widely used as initial screening tools to detect trace amounts of blood in stool, indicating potential malignancies. However, these methods exhibit limitations in sensitivity and specificity, often leading to false-negative or false-positive results. Stool DNA testing has advanced in recent years by identifying cancer-associated genetic mutations or methylation markers, thereby improving the detection rate of early malignancies. Nevertheless, the high costs associated with such methods have hindered their widespread clinical application.
Invasive diagnostic methods, including colonoscopy and computed tomography (CT) colonography, are currently considered the gold standards for confirming CRC diagnosis and enabling therapeutic interventions (14,15). While these techniques are indispensable for definitive diagnosis, their invasiveness, cost, and operational requirements highlight the need for accessible preliminary screening tools. The proposed improved sine algorithm-guided dung beetle optimizer with weighted voting (MSADBO-WV) method leverages routine physical examination data to provide a non-invasive, scalable option for initial risk assessment, complementing rather than replacing existing diagnostic protocols (16). CT colonography, regarded as a relatively safer imaging method, has a lower radiation risk but faces limitations in accuracy that may require follow-up colonoscopy for definitive diagnosis similar to how positive findings from the MSADBO-WV method would require subsequent clinical confirmation through standard diagnostic procedures.
Early screening and treatment are pivotal for effective disease management and are significantly more cost-effective than treating advanced stages. Despite this, traditional screening methods face notable shortcomings, with many eligible individuals remaining unscreened. Moreover, current CRC screening guidelines have been summarized by recent reports, emphasizing the need for future blood-based screening initiatives (17,18).
With the rapid development of artificial intelligence and machine learning, increasing attention has been directed toward utilizing machine learning algorithms for early CRC diagnosis (19-23). Machine learning can process large multidimensional datasets, such as genetic data, pathological images, and patient health records, to identify early cancer characteristics and achieve efficient, accurate diagnoses. The significant achievements of machine learning algorithms in image recognition (24), biomarker detection (25), and risk prediction for cancer have been demonstrated by recent studies.
As data volume and complexity continue to grow, traditional single machine learning models have shown limitations in addressing complex classification problems. To improve model accuracy and stability, various ensemble learning methods, including bagging and boosting, have been developed (26-29). These approaches combine multiple models and adopt iterative optimization strategies to achieve significant progress in reducing variance and bias. However, traditional ensemble strategies may remain sensitive to noise or prone to overfitting when handling high-dimensional and heterogeneous data, thereby failing to fully capture complex patterns. Advanced ensemble strategies, such as weighted averaging and voting, have therefore gained increasing attention (30-32). Different weights to predictions from various models are assigned by weighted averaging, enabling superior models to contribute more significantly to decision-making, thereby improving overall predictive accuracy and robustness. Voting methods aggregate classification outcomes from multiple models, with soft voting strategies capturing inter-class probability distributions more precisely, thereby enhancing generalization capabilities. These advanced strategies not only leverage model diversity but also improve the recognition of complex features and latent patterns. Consequently, adopting advanced ensemble methods, such as weighted voting, has become a significant research trend in machine learning, demonstrating strong adaptability and substantial application potential.
Highly accurate early diagnostic methods are essential for effective prevention and control of CRC. However, studies employing routine laboratory test indicators for early CRC diagnosis remain scarce. This study collected routine examination data, including blood tests, liver and kidney function assessments, glucose levels, lipid profiles, tumor markers, and stool analyses, to compare the classification accuracy of various classifiers between early CRC patients and healthy controls (HC). Feature selection algorithms were applied to identify significant features distinguishing early CRC patients from HC for further analysis. These selected features were then input into ensemble learning classifiers to determine the optimal feature subset with the highest classification accuracy. This study aimed to contribute to early CRC screening and prevention strategies through the proposed analytical framework.
Methods
Data
The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the ethics board of Jinhua Municipal Central Hospital (No. 1.11-2.0). Due to the retrospective nature of the study, the ethics review committee waived the need of obtaining informed consent. All methods were performed in accordance with the relevant guidelines and regulations. Blood data from 197 patients diagnosed with early-stage CRC between 2018 and 2023 at the Colorectal and Anal Surgery Department of the Jinhua Municipal Central Hospital were selected for the CRC group. The CRC patients had a median age of 70 years (range, 37–90 years), with 121 males and 76 females. The HC group comprised 188 records from physical examinations conducted in 2024 by the Health Management Center of the same hospital. The ages of the individuals in the HC group varied from 21 to 82 years, including 92 males and 96 females. Notably, the CRC group had a higher number of male patients than female patients, and the majority were over 50 years old. The incidence of CRC was observed to be lower among young and middle-aged individuals compared to the elderly population. A total of 45 laboratory test indicators were utilized, encompassing routine blood tests, liver function, renal function, blood glucose, lipid profiles, tumor markers with abbreviations and corresponding full names, and routine stool tests, as detailed in Table 1. While some indicators in Table 1 (particularly tumor markers and stool tests) are not routinely included in general health checkups, our method demonstrates superior performance by integrating these with more common blood-based markers. The framework achieves higher accuracy (98.16%) compared to conventional screening methods, while maintaining non-invasiveness and cost-effectiveness.
Table 1
| Test types | Items | Abbreviations |
|---|---|---|
| Routine blood test | White blood cell count | WBCC |
| Neutrophil percentage | NP | |
| Percentage of lymphocytes | POL | |
| Percentage of monocytes | POM | |
| Percentage of eosinophils | POE | |
| Percentage of basophils | POB | |
| Neutrophil count | NC | |
| Lymphocyte count | LC | |
| Monocyte count | MC | |
| Eosinophil count | ESC | |
| Basophil count | BC | |
| Erythrocyte count | ETC | |
| Hemoglobin concentration | HC | |
| Specific volume of red blood cells | SVORBC | |
| Mean hematocrit | MH | |
| Mean red cell hemoglobin | MRCH | |
| Mean erythrocyte hemoglobin concentration | MEHC | |
| Erythrocyte volume distribution width | EVDW | |
| Total blood platelet count | TBPC | |
| Mean platelet volume | MPV | |
| Hematocrit (blood platelet count) | BPC | |
| Platelet volume distribution width | PVDW | |
| Liver function test | Total bilirubin | TB |
| Glutamine aminotransferase (an amino acid) | GATT | |
| Glutamic transaminase (an amino acid) | GST | |
| Glutamate/glutamate | GTM | |
| Y-glutamyltransferase | YG | |
| Alkaline phosphatase | AP | |
| Renal function test | Creatinine | CREATININE |
| Urea nitrogen | UN | |
| Urea nitrogen: creatinine | UN:C | |
| Uric acid | UA | |
| Blood glucose test | Glucose | GLUCOSE |
| Lipid test | Total cholesterol | TC |
| Triglyceride | Triglyceride | |
| High-density lipoprotein | HDL | |
| Low-density lipoprotein | LDL | |
| Tumor marker test | Alpha fetoprotein | AFP |
| Carcinoembryonic antigen | CEA | |
| Glycoantigen 19-9 | G 19-9 | |
| Routine stool test | Color | CO |
| Character | CHA | |
| Microscopic examination of erythrocytes | MEE | |
| Microscopic examination of white blood cells | MEWBC | |
| Occult blood test | OBT |
Weighted voting strategy
The weighted voting strategy is implemented by integrating multiple traditional machine learning models and base ensemble models to construct an initial model set. During the training and testing phases, the accuracy of each model on the test set is calculated and used as the baseline for initial weight assignment. Subsequently, an optimized weight α is introduced for each model, with the final prediction probability determined by the product of the model’s accuracy and the optimized weight α. The weighted prediction probabilities Pweighted of all models are then summed to determine the final label prediction probability.
The MSADBO has been proposed to address the limitations of the dung beetle optimization algorithm (DBO) (33-35). This is inspired by the foraging behavior of dung beetles. The DBO simulates the processes of rolling dung balls and using natural landmarks for navigation to explore and optimize within a multidimensional search space. While the algorithm demonstrates strong optimization capability and fast convergence, it suffers from an imbalance between global exploration and local exploitation, leading to a propensity to fall into local optima and weak global exploration ability.
The MSADBO, whose flowchart is shown in Figure 1, introduces three key improvements. Those are Bernoulli map chaotic sequence distribution, integration with the improved sine algorithm (MSA), and incorporation of adaptive Gaussian-Cauchy hybrid mutation perturbation. Specifically, the Bernoulli map is used to initialize the positions of dung beetle individuals. First, the Bernoulli mapping relationship projects the generated values into the chaotic variable space. Then, these chaotic values are linearly transformed to map them into the algorithm’s initial search space. The Bernoulli map is mathematically expressed as: . Here, β represents the mapping parameter, where β ∈ (0, 1).
The MSA strategy leverages the sine function in mathematics for iterative optimization, exhibiting strong global exploration capabilities. Additionally, an adaptive variable inertia weight coefficient ωt is introduced during the position update process, enabling the algorithm to thoroughly search local regions and achieve a balanced trade-off between global exploration and local exploitation. The inertia weight decreases linearly with the number of iterations. The position update formula is expressed as follows:
Here, t represents the current iteration number, ωt denotes the inertia weight, and xi(t) refers to the i position component of individual x during the t iteration. Similarly, pi(t) is the i component of the best individual’s position variable at the t iteration. The parameter r1 is a non-linear decreasing function, while r2 is a random number in the interval [0, 2π], and r3 is a random number in the interval [−2, 2]. The expression for r1 is given as follows:
Here, ωmax and ωmin represent the maximum and minimum values of ωt, respectively. t denotes the current iteration number, and Tmax indicates the maximum number of iterations.
To further improve the DBO algorithm’s ability to balance global exploration and local exploitation, a sine-guided mechanism has been introduced. The MSA strategy is embedded into the DBO algorithm as a replacement for the dung beetle’s tangent-based dance strategy. Specifically, during the rolling stage, sine-based operations are applied to guide the position updates of the entire dung beetle population. The improved formula is expressed as follows:
Here, δ= rand(1), ST ∈ (0.5, 1]. In the improved position update formula, when δ < ST, it indicates that the dung beetle rolls with a specific target, operating in a normal global exploration phase. Conversely, when δ ≥ ST, it signifies that the dung beetle lacks a clear rolling target but instead performs search movements guided by a sine function.
Cauchy mutation and Gaussian mutation are two commonly used mutation operators in intelligent optimization algorithms, each with its advantages and limitations. Cauchy mutation covers a broader search range than Gaussian mutation, but its excessively large step size can cause it to overshoot the optimal value, producing suboptimal offspring. In contrast, Gaussian mutation demonstrates strong search capabilities within a small range due to its higher probability of generating small mutation values. To leverage the strengths of both operators, an adaptive Gaussian-Cauchy hybrid mutation strategy has been proposed. The specific formula is expressed as follows:
Here, Xb(t) represents the optimal position of individual X during the t iteration, and Hb(t) denotes the position of Xb(t) after undergoing Gaussian-Cauchy hybrid perturbation. Gauss(σ) refers to the Gaussian mutation operator, and Gauchy(σ) represents the Cauchy mutation operator. The weight coefficients of the mutation operators, and , μ1 and μ2 are adjusted linearly in one dimension during the iterations. This adjustment aims to ensure balanced and smooth perturbations in each iteration.
MSADBO-WV
A novel weighted voting ensemble learning strategy, referred to as the MSADBO-WV, is proposed in this study. This approach incorporates the iterative optimization of MSADBO to dynamically adjust the weights α of each model within the weighted voting framework. By maximizing the overall classification accuracy, the strategy achieves optimal ensemble performance. The flowchart of the method is shown in Figure 2.
Feature selection
To eliminate redundant features and prevent overfitting, this study employed a feature selection method combining the random forest (RF) model with the MSADBO-WV strategy. The RF model was used to evaluate each feature in the dataset and determine its importance score based on its contribution to the model’s predictive performance (36,37). Subsequently, all features were ranked in descending order of importance scores to identify those with the greatest impact on the model’s classification performance. On this basis, classification accuracy was calculated during each round of feature selection. By incrementally increasing the number of selected features, accuracy was evaluated for different feature subsets, and the relationship between the number of features and classification performance was explored. This incremental feature addition strategy facilitated the identification of redundant features and their impact on the model while enabling the determination of the optimal feature subset.
Statistical analysis
Python was used as the core environment for ensemble learning model fitting and evaluation. While all statistical analysis were done with one-way analysis of variance (ANOVA) in Python. The significance level was set at P<0.05 to ensure the rigor of the study results.
Evaluation methods and metrics
In this study, the classification accuracy was evaluated using the five-fold cross-validation method, a widely used technique for assessing the performance of machine learning models. The dataset was evenly divided into five subsets, and five independent training and validation processes were conducted. During each iteration, four subsets were combined as the training set, while the remaining subset was used as the validation set. After each iteration, the model was evaluated using the validation set, with performance measured using metrics such as accuracy, precision, recall, and F1 score. The results from all five iterations were averaged, and the standard deviation (SD) was calculated to provide a comprehensive assessment of the model’s performance. Five-fold cross-validation effectively reduces the randomness introduced by specific data splits through repeated training and validation, allowing for a more precise estimation of the model’s generalization ability. Additionally, it helps identify issues of overfitting or underfitting.
To comprehensively evaluate the model’s performance, accuracy, precision, recall, and F1 score were employed as the primary evaluation metrics. The performance assessment was based on a comparison between the predicted labels and the true labels, determining the counts of true positives (TPs), false negatives (FNs), true negatives (TNs), and false positives (FPs). Specifically:
Results
DF exhibited the best performance across all metrics
The results presented in Table 2 demonstrated the classification performance of various machine learning models in the context of early CRC screening. Evaluation metrics, including accuracy, precision, F1 score, and recall, were reported as mean ± SD. DF exhibited the best performance across all metrics, achieving an accuracy of 97.89%±1.78%, precision of 97.96%±1.74%, and F1 score and recall of 97.89%±1.79% and 97.89%±1.78%, respectively. These findings indicated the high robustness and classification capability of the DF model. RF and bagging followed closely, showing performance metrics comparable to DF, which validated their effectiveness in handling complex medical datasets.
Table 2
| Model | Accuracy (%) | Precision (%) | F1 score (%) | Recall (%) |
|---|---|---|---|---|
| AdaBoost | 97.37±1.86 | 97.46±1.79 | 97.37±1.86 | 97.37±1.86 |
| Bagging | 97.37±1.86 | 97.46±1.79 | 97.37±1.86 | 97.37±1.86 |
| ExtraTrees | 96.58±2.14 | 96.62±2.12 | 96.58±2.14 | 96.58±2.14 |
| SGD | 67.37±6.83 | 77.46±3.05 | 63.72±10.65 | 67.37±6.83 |
| RF | 97.63±1.75 | 97.69±1.7 | 97.63±1.75 | 97.63±1.75 |
| XGB | 96.84±1.78 | 96.91±1.76 | 96.84±1.78 | 96.84±1.78 |
| DF | 97.89±1.78 | 97.96±1.74 | 97.89±1.79 | 97.89±1.78 |
| CatBoost | 96.58±2.44 | 96.65±2.37 | 96.58±2.44 | 96.58±2.44 |
| GradientBoost | 96.58±2.71 | 96.62±2.67 | 96.58±2.71 | 96.58±2.71 |
| SVC | 67.63±5.8 | 73.28±5.01 | 65.27±7.36 | 67.63±5.8 |
| MLP classifier | 89.74±2.55 | 89.97±2.48 | 89.73±2.56 | 89.74±2.55 |
Data are presented as mean ± SD. DF, deep forest; MLP, multi-layer perceptron; RF, random forest; SD, standard deviation; SGD, stochastic gradient descent; SVC, support vector classifier; XGB, XGBoost.
Exceptional performance of ensemble learning models
Ensemble learning models such as AdaBoost, bagging, XGBoost (XGB), and GradientBoost also performed exceptionally well, with all achieving accuracy rates exceeding 96%. In particular, AdaBoost and XGB consistently demonstrated stable and high performance across multiple metrics, highlighting their suitability for early CRC screening tasks. In contrast, stochastic gradient descent (SGD) and support vector classifier (SVC) performed poorly, with accuracy scores of 67.37%±6.83% and 67.63%±5.80%, respectively. These results suggested that these models were unable to effectively manage the complexity of CRC datasets, likely due to limitations in feature extraction or data distribution.
Models like ExtraTrees and CatBoost showed moderate performance, with accuracy rates exceeding 96% but falling slightly behind RF and DF. The multi-layer perceptron classifier (MLPClassifier) exhibited relatively average performance, with an accuracy of 89.74%±2.55% and other metrics close to 90%.
The results in Table 2 highlighted the significant advantages of ensemble learning models, particularly deep forest (DF), RF, bagging, and AdaBoost, in early CRC screening. In comparison, simpler models such as SGD and SVC were less effective and struggled to meet the analytical demands of such complex medical datasets. This finding emphasized the importance of employing robust ensemble learning algorithms when dealing with high-dimensional and intricate features.
Superior performance of MSADBO-WV
Table 3 presented the classification performance of multiple ensemble learning models for early CRC screening, with evaluation metrics reported as mean ± SD. MSADBO-WV achieved the best performance across all metrics, with an accuracy of 98.16%±1.34%, precision of 98.20%±1.32%, and F1 score and recall of 98.16%±1.34%. These results demonstrated the ability of MSADBO-WV to effectively integrate various ensemble strategies, significantly improving classification performance. This made it particularly suitable for tasks requiring high accuracy in early CRC screening.
Table 3
| Model | Accuracy (%) | Precision (%) | F1 score (%) | Recall (%) |
|---|---|---|---|---|
| Hard voting | 97.63±1.53 | 97.70±1.47 | 97.63±1.53 | 97.63±1.53 |
| Soft voting | 97.37±1.86 | 97.51±1.72 | 97.37±1.86 | 97.37±1.86 |
| Simple averaging | 97.89±1.34 | 97.96±1.32 | 97.89±1.34 | 97.89±1.34 |
| Weighted averaging | 97.37±1.86 | 97.46±1.86 | 97.37±1.86 | 97.37±1.86 |
| MSADBO-WV | 98.16±1.34 | 98.20±1.32 | 98.16±1.34 | 98.16±1.34 |
Data are presented as mean ± SD. MSADBO-WV, improved sine algorithm-guided dung beetle optimizer with weighted voting; SD, standard deviation.
The performance of simple averaging closely followed that of MSADBO-WV, with an accuracy of 97.89%±1.34% and other metrics exceeding 97%. This finding confirmed the effectiveness of weighted strategies in enhancing model performance, although they remained inferior to the dynamic weight optimization employed by MSADBO-WV. Hard voting achieved comparable performance, with accuracy rates of 97.63%±1.53%. This suggested that, while these methods improved performance by combining results from multiple models, the absence of dynamic weight adjustment limited their effectiveness. Soft voting and weighted averaging demonstrated the weakest performance, with an accuracy of 97.37%±1.86%.
The results in Table 3 underscored the superior performance of MSADBO-WV compared to traditional ensemble learning strategies, particularly in terms of precision and recall. This demonstrated that dynamic optimization of model weights allowed a better balance between global exploration and local exploitation, further enhancing classification performance. This approach was shown to have significant application value in early CRC screening.
Selected features
To improve classification accuracy and identify the most effective features for classification, feature selection was performed on 45 features in the dataset. Feature importance was calculated using the RF method, and accuracy was evaluated using the MSADBO-WV approach. The results, as shown in Figure 3, indicated that when the number of selected features increased to 26, the model achieved its highest accuracy. The four-evaluation metrics reached values of 98.42%±1.53%, 98.46%±1.51%, 98.42%±1.53%, and 98.42%±1.53%, respectively. Compared to the initial classification results, accuracy improved by 0.26%. A bar chart comparing classification performance before and after feature selection is presented in Figure 4. Consequently, the top 26 features based on importance ranking were selected as the optimal feature subset.
Figure 5 illustrated the comparison of mean values and the significance levels (P values) of different features in the optimal feature subset between CRC patients and HC. The analysis covered multiple key features, with their mean values and intergroup differences provided. P values below 0.05 were observed for all features, with several features exhibiting extremely low P values, reaching the 10−10 level [e.g., platelet volume distribution width (PVDW), occult blood test (OBT), carcinoembryonic antigen (CEA)]. These findings demonstrated statistically significant differences between the CRC and HC groups (38-40). For example, the P value for CEA was 2.79×10−10, indicating its strong discriminative capability and importance as a diagnostic marker. Certain biomarkers, such as PVDW, CEA, high-density lipoprotein (HDL), and total cholesterol (TC), displayed significantly higher mean values in the CRC group, suggesting their potential association with the pathogenesis or pathological changes of CRC. Conversely, features like alkaline phosphatase (AP) and creatinine showed slightly higher mean values in the HC group, indicating their prominence in a healthy state.
Blood-related indicators, including PVDW, mean platelet volume (MPV), and HDL, appeared to be associated with tumor-related metabolic or inflammatory responses. Tumor markers such as CEA and carbohydrate antigen 19-9 (CA19-9) were significantly elevated in the CRC group (41,42), consistent with their established clinical roles in cancer screening. Metabolic and organ function indicators, such as AP and creatinine, reflected potential metabolic abnormalities or liver and kidney function changes in CRC patients.
The results shown in Figure 5 confirmed that most features in the optimal subset exhibited significant differences between the CRC and HC groups, further validating their potential value in early CRC screening. This significance analysis provided scientific evidence for subsequent diagnostic model optimization and biomarker selection, laying a foundation for advancing early cancer detection.
Discussion
This study was conducted to systematically evaluate the performance of various machine learning models and ensemble learning strategies within the framework of early CRC screening. The findings demonstrated the superiority of advanced ensemble techniques and feature selection methods in this context.
The results revealed that ensemble learning models significantly outperformed traditional single classifiers in early CRC screening. Among the models evaluated, DF, RF, and bagging consistently achieved high classification accuracy, precision, recall, and F1 scores. This performance underscored their robustness and effectiveness in managing complex medical datasets. Notably, the MSADBO-WV model achieved the highest accuracy (98.16%±1.34%) and exhibited substantial improvements across all key metrics. This outcome validated the model’s ability to dynamically adjust weights and achieve a balance between global exploration and local exploitation, making it particularly suitable for critical applications such as early CRC screening.
The superior performance of MSADBO-WV compared to other ensemble methods, such as weighted averaging and simple averaging, highlighted the importance of dynamic weight adjustment in ensemble learning. In contrast, traditional voting methods like hard voting and soft voting were found to be less precise, likely due to their reliance on static weighting mechanisms. These findings emphasized that advanced ensemble approaches capable of adapting to the complexities of the dataset could significantly enhance classification accuracy.
Feature selection was also shown to play a critical role in improving model performance. The importance of 45 features was ranked using RF, and classification accuracy was iteratively evaluated with MSADBO-WV. The results indicated that the optimal subset consisted of the top 26 features, which improved accuracy to 98.42%±1.53%, representing a 0.26% increase over the initial results. This enhancement demonstrated the effectiveness of reducing feature redundancy and focusing on the most relevant predictors, not only to improve accuracy but also to reduce computational overhead.
The statistical significance of the selected features further reinforced their importance. As shown in Figure 5, a comparison of mean values between CRC and HC groups revealed that most features exhibited significant differences (P<0.05), with some features reaching P values as low as 10−10 [e.g., OBT, erythrocyte count (ETC), specific volume of red blood cells (SVORBC), percentage of lymphocytes (POL), CEA, PVDW, hemoglobin concentration, and neutrophil percentage (NP)]. This highlighted the critical role of features such as CEA (a widely recognized tumor marker), PVDW, OBT, and MPV in distinguishing CRC patients from HC. Additionally, metabolic and organ function indicators such as AP and creatinine provided valuable insights into the pathophysiological changes associated with CRC, suggesting their potential as complementary diagnostic markers.
These findings carry significant implications for early CRC screening. The exceptional performance of MSADBO-WV suggests that advanced ensemble learning strategies should be prioritized in the development of diagnostic tools for complex diseases such as CRC. Furthermore, identifying a robust subset of predictive features has the potential to enable cost-effective and scalable screening protocols centered on key biomarkers.
However, the study is not without limitations. The reliance on cross-validation within a single dataset may not fully capture the variability or generalizability of the models across different populations or datasets. Future research should focus on validating these findings using external datasets and exploring the clinical applicability of the proposed methods. Additionally, while MSADBO-WV demonstrated remarkable performance, its computational complexity may limit real-time implementation, necessitating further optimization for large-scale deployment. While we demonstrated high accuracy for early-stage CRC detection, longitudinal data on resection rates and survival outcomes are lacking. The cost-benefit ratio versus established screening modalities requires further evaluation.
Conclusions
In conclusion, this study highlights the pivotal role of advanced ensemble learning strategies and feature selection in enhancing early CRC screening. The MSADBO-WV method, combined with an optimal feature subset, achieved unparalleled classification performance, demonstrating its potential as a highly effective tool for early cancer detection. These findings provide a strong foundation for future research aimed at integrating machine learning into clinical workflows to advance the early detection and management of CRC.
Acknowledgments
None.
Footnote
Data Sharing Statement: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-316/dss
Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-316/prf
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2025-316/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the ethics board of Jinhua Municipal Central Hospital (No. 1.11-2.0). Due to the retrospective nature of the study, the ethics review committee waived the need of obtaining informed consent.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Tang WZ, Tang Y, Liu TH. The influence of multivitamin intake on survival patterns in patients with colorectal cancer. Cancer 2024;130:4364-5. [Crossref] [PubMed]
- Martínez ME, Roesch S, Largaespada V, et al. A pragmatic randomized trial of mailed fecal immunochemical testing to increase colorectal cancer screening among low-income and minoritized populations. Cancer 2024;130:3170-9. [Crossref] [PubMed]
- Sterpetti AV, Gabriele R, Iannone I, et al. National organized screening programs for breast, colorectal, and cervical cancer reduce socioeconomic disparities, but it is not enough. Cancer 2024;130:2395-7. [Crossref] [PubMed]
- Ma YK, Qu L, Chen N, et al. Effect of multimodal opioid-sparing anesthesia on intestinal function and prognosis of elderly patients with hypertension after colorectal cancer surgery. BMC Surg 2024;24:341. [Crossref] [PubMed]
- Kumegawa K, Maruyama R, Yamamoto E, et al. A genomic screen for long noncoding RNA genes epigenetically silenced by aberrant DNA methylation in colorectal cancer. Sci Rep 2016;6:26699. [Crossref] [PubMed]
- Guittet L, Quipourt V, Aparicio T, et al. Should we screen for colorectal cancer in people aged 75 and over? A systematic review - collaborative work of the French geriatric oncology society (SOFOG) and the French federation of digestive oncology (FFCD). BMC Cancer 2023;23:17. [Crossref] [PubMed]
- Novotny SA, Rodrigo Amador VA, Seguí Orejuela J, et al. Prognostic Study of Colorectal Cancer: Differences between Screen-Detected and Symptom-Diagnosed Patients. Cancers (Basel) 2024;16:3363. [Crossref] [PubMed]
- He C, Huang Q, Zhong S, et al. Screening and identifying of biomarkers in early colorectal cancer and adenoma based on genome-wide methylation profiles. World J Surg Oncol 2023;21:312. [Crossref] [PubMed]
- Chang A, Prabhala S, Daneshkhah A, et al. Early screening of colorectal cancer using feature engineering with artificial intelligence-enhanced analysis of nanoscale chromatin modifications. Sci Rep 2024;14:7808. [Crossref] [PubMed]
- Wang Y, Wu ZL, Wang YG, et al. Early colorectal cancer screening-no time to lose. World J Gastroenterol 2024;30:2959-63. [Crossref] [PubMed]
- Noursina A, Safari F. Enhancing Colorectal Cancer Screening Specificity by Integrating Fecal MicroRNA Signature With Fecal Immunochemical Test and Fecal Occult Blood Test. Gastroenterology 2024;166:358. [Crossref] [PubMed]
- Syrjänen K, Eskelinen M, Meklin J, et al. Colorectal Cancer Screening by Fecal Immunochemical Tests (FIT): Considerations on Sampling and Markers (Hb and Hb/Hp Complex) of Fecal Occult Blood (FOB). Anticancer Res 2024;44:1513-23. [Crossref] [PubMed]
- Foo DCC, Ng L, Law WL. Identification and evaluation of a reliable, non-invasive diagnostic method using serum microRNA expression to diagnose high-risk colorectal cancer candidates. J Clin Oncol 2023;41:e15519. [Crossref]
- Somlo DR, Goiffon RJ, Richter J, et al. 1259 Impact of a program offering CT colonography as an alternative to colonoscopy for colorectal cancer screening in heart transplant evaluation. Gastroenterology 2023;164:S-262-3. [Crossref]
- Jolliffe S, McGivney F, Chin M, et al. EP.TU.313 Colorectal cancer follow up: Can CT Colonography replace Colonoscopy? Br J Surg 2021;108:znab311.041.
- Martín-López JE, Beltrán-Calvo C, Rodríguez-López R, et al. Comparison of the accuracy of CT colonography and colonoscopy in the diagnosis of colorectal cancer. Colorectal Dis 2014;16:O82-9. [Crossref] [PubMed]
- Forbes SP, Yay Donderici E, Zhang N, et al. Population health outcomes of blood-based screening for colorectal cancer in comparison to current screening modalities: insights from a discrete-event simulation model incorporating longitudinal adherence. J Med Econ 2024;27:991-1002. [Crossref] [PubMed]
- Rolfo C, Russo A. The Next Frontier for Colorectal Cancer Screening: Blood-Based Tests. Cancer Res 2024;84:3128-9. [Crossref] [PubMed]
- Zhu M, Han Y, Qiu Y, et al. Early colorectal cancer detection: a serum analysis platform combining SERS and machine learning. Anal Methods 2024;16:8179-87. [Crossref] [PubMed]
- Wang Z, Sun Z, Lv H, et al. Machine learning-based model for CD4+ conventional T cell genes to predict survival and immune responses in colorectal cancer. Sci Rep 2024;14:24426. [Crossref] [PubMed]
- Zhong Y, Chen X, Wu S, et al. Deciphering colorectal cancer radioresistance and immune microrenvironment: unraveling the role of EIF5A through single-cell RNA sequencing and machine learning. Front Immunol 2024;15:1466226. [Crossref] [PubMed]
- Woźniacki A, Książek W, Mrowczyk P. A Novel Approach for Predicting the Survival of Colorectal Cancer Patients Using Machine Learning Techniques and Advanced Parameter Optimization Methods. Cancers (Basel) 2024;16:3205. [Crossref] [PubMed]
- Zheng S, He H, Zheng J, et al. Machine learning-based screening and validation of liver metastasis-specific genes in colorectal cancer. Sci Rep 2024;14:17679. [Crossref] [PubMed]
- Wu CW, Huang TY, Liou YC, et al. Recognition of Glaucomatous Fundus Images Using Machine Learning Methods Based on Optic Nerve Head Topographic Features. J Glaucoma 2024;33:601-6. [Crossref] [PubMed]
- Tran ATT, Hassan K, Tung TT, et al. Graphene and metal-organic framework hybrids for high-performance sensors for lung cancer biomarker detection supported by machine learning augmentation. Nanoscale 2024;16:9084-95. [Crossref] [PubMed]
- Singhal R, Kashef R. A weighted stacking ensemble model with sampling for fake reviews detection. IEEE Trans Comput Soc Syst 2023;11:2578-94. [Crossref]
- Khan AA, Chaudhari O, Chandra R. A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Syst Appl 2024;244:122778. [Crossref]
- Liu B. Robust sequential online prediction with dynamic ensemble of multiple models: A review. Neurocomputing 2023;552:126553. [Crossref]
- Yao J, Zheng Y, Jiang H. An ensemble model for fake online review detection based on data resampling, feature pruning, and parameter optimization. IEEE Access 2021;9:16914-27.
- Ma L, Yao W, Dai X, et al. A New Evidence Weight Combination and Probability Allocation Method in Multi-Sensor Data Fusion. Sensors (Basel) 2023;23:722. [Crossref] [PubMed]
- Zhang K, Shen J, Han H, et al. Study of the Allocation of Regional Flood Drainage Rights in Watershed Based on Entropy Weight TOPSIS Model: A Case Study of the Jiangsu Section of the Huaihe River, China. Int J Environ Res Public Health 2020;17:5020. [Crossref] [PubMed]
- Dickinson SL, Golzarri-Arroyo L, Brown AW, et al. Change in study randomization allocation needs to be included in statistical analysis: comment on 'Randomized controlled trial of weight loss versus usual care on telomere length in women with breast cancer: the lifestyle, exercise, and nutrition (LEAN) study'. Breast Cancer Res Treat 2019;175:263-4. [Crossref] [PubMed]
- Zhu X, Ni C, Chen G, et al. Optimization of Tungsten Heavy Alloy Cutting Parameters Based on RSM and Reinforcement Dung Beetle Algorithm. Sensors (Basel) 2023;23:5616. [Crossref] [PubMed]
- Xiong M, Zheng S, Liu W, et al. A rate of penetration (ROP) prediction method based on improved dung beetle optimization algorithm and BiLSTM-SA. Sci Rep 2024;14:25856. [Crossref] [PubMed]
- Mai C, Zhang L, Chao X, et al. A novel MPPT technology based on dung beetle optimization algorithm for PV systems under complex partial shade conditions. Sci Rep 2024;14:6471. [Crossref] [PubMed]
- Sun X, Chai J. Random forest feature selection for partial label learning. Neurocomputing 2023;561:126870. [Crossref]
- Li G, Wang C, Zhang D, et al. An Improved Feature Selection Method Based on Random Forest Algorithm for Wind Turbine Condition Monitoring. Sensors (Basel) 2021;21:5654. [Crossref] [PubMed]
- Zhu X, Cao Y, Lu P, et al. Evaluation of platelet indices as diagnostic biomarkers for colorectal cancer. Sci Rep 2018;8:11814. [Crossref] [PubMed]
- Sheng S, Bai X, Wang Y, et al. Effect of elevated CEA levels on the outcome of colorectal cancer patients with different histopathologic types:A SEER population-based study. Biomol Biomed 2025;25:1396-407. [Crossref] [PubMed]
- Sun Q, Long L. Diagnostic performances of methylated septin9 gene, CEA, CA19-9 and platelet-to-lymphocyte ratio in colorectal cancer. BMC Cancer 2024;24:906. [Crossref] [PubMed]
- Sato R, Oikawa M, Kakita T, et al. Prognostic value of carcinoembryonic antigen (CEA) and CA 19-9 levels in patients with obstructive colorectal cancer treated with a self-expandable metallic stent and curative surgery. Surg Today 2025;55:618-26. [Crossref] [PubMed]
- Gatihi M, Amayo A, Kisia N. Prevalence of serum CEA and CA 19-9 elevation in patients with colorectal cancers at The Nairobi Hospital, Kenya. Clin Chim Acta 2024;558:118829. [Crossref]




