Machine learning-based gastric cancer risk prediction in an asymptomatic screening population: a retrospective cohort study
Highlight box
Key findings
• Survival-based machine learning (ML) models using large-scale health check-up data from an asymptomatic screening population achieved moderate discrimination for gastric cancer risk prediction, with the XGBoost Survival model demonstrating a concordance index of 0.78.
• Helicobacter pylori infection, chronic atrophic gastritis, and intestinal metaplasia were identified as the most influential predictors, and model interpretation using SHapley Additive exPlanations (SHAP) showed patterns consistent with established clinical knowledge.
What is known and what is new?
• Gastric cancer risk prediction models have previously been develop
clinical cohorts, often using conventional statistical approaches or classification-based ML methods that do not account for time-to-event data.
• This study develops survival-based ML models using a large asymptomatic health screening cohort and provides explainable risk prediction through SHAP analysis, demonstrating the feasibility of individualized risk stratification in a general screening population.
What is the implication, and what should change now?
• Survival-based ML approaches may support future development of risk-adapted gastric cancer surveillance strategies in screening populations.
• Further studies with external validation and calibration assessment are required before clinical implementation and integration into personalized endoscopic screening policies.
Introduction
Gastric cancer is the fifth most common cause of cancer deaths worldwide (1). Endoscopic screening has shown an effect in reducing gastric cancer mortality (2-4). However, frequent endoscopic cancer screening can be costly, inconvenient, and even harmful for some patients (5,6). Major potential harms of endoscopic screening include infections, adverse effects, false-positive results, and overdiagnosis (7). Guidelines establishing an appropriate screening interval considering the benefits and harms of endoscopic screening are necessary. Given the insufficient evidence of clear guidelines for endoscopic screening intervals in most countries, objective risk stratification and assessment based on endoscopic findings would greatly help clinicians to develop personalized follow-up strategies. Also, a deeper understanding of the relationships between risk factors and cancer risks for each patient is expected to help clinicians and patients develop more personalized preventive measures.
A significant body of epidemiology and clinical trial studies have identified various risk factors for gastric cancer, ranging from gastrointestinal pathology, such as Helicobacter pylori (H. pylori) infection, chronic atrophic gastritis (CAG) and intestinal metaplasia (IM) (8-14), to patient’s lifestyle (smoking, drinking, high salt intake) (15-17), family history (18,19), and pre-existing health conditions such as obesity, diabetes, metabolic syndrome [e.g., hemoglobin A1c (HbA1c), glucose, etc.] (12,20-22). However, these studies do not provide clear, quantitative guidance on how to assess the presence of individual risk factors when evaluating the overall, time-dependent risk of a given patient.
Recently, several machine learning (ML) models have been developed for gastric cancer risk prediction (23-25). While promising, these models suffer from significant methodological and practical limitations that restrict their clinical utility. For instance, most prior studies employ classification-based probabilistic outcomes, which only predict cancer as a binary event within a fixed period, thereby ignoring the continuous nature of disease progression over time and failing to appropriately handle data censoring (26). Furthermore, many models are trained on small or high-risk cohorts (e.g., patients with chronic gastritis), which introduces significant selection bias and limits their generalizability to the broader asymptomatic screening population (25). Finally, their general lack of transparency and explainability hinders clinician trust and effective clinical adoption. We acknowledge recent progress in deep learning for health prediction but maintain that interpretability is paramount for individual risk stratification in this context.
In this paper, we address these gaps by developing survival-based ML models that predict a patient’s time-dependent risk of developing gastric cancer using comprehensive medical check-up data from a general population cohort. We compared the performance of several survival prediction models as well as different feature sets using Harrell’s concordance index (c-index). Crucially, we utilize Shapley Additive exPlanation (SHAP) analysis to provide detailed, individualized explanations for predictions, thereby illuminating the clinical impact of specific features and ensuring the transparency necessary for personalized clinical decision-making and risk counseling. We present this article in accordance with the STROBE reporting checklist (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2026-0245/rc).
Methods
Data
This study was conducted as a retrospective, observational, single-center study. Our data was extracted from the Electronic Health Record database of Seoul National University Hospital Gangnam Center (data access date 17/03/2021–23/03/2021), which is one of the largest medical check-up centers in South Korea. Patients in our study were selected from 129,223 predominantly healthy individuals who visited the center for a regular annual checkup, which included endoscopic procedures as a standard component, between 2007 and 2020. For each patient, the prediction was made based on the medical check-up result of their first visit, and we considered their subsequent visits as follow-ups.
The data collected is rich and reliable, combining objective structured data with clinical findings. The center maintains rigorous quality control, including continuous education and intervention with feedback, to ensure a high level of agreement in endoscopic diagnoses, maximizing data reliability. During each medical check-up, patients undergo various clinical tests, such as physical examination, blood test, and gastroscopy, on the day of the visit for a health checkup. On the day of each visit, patients are also asked to complete a self-administered structured questionnaire about smoking history, alcohol intake, eating habits, family history, and past medical history. Smoking history was evaluated as non-smokers, past smokers, and current smokers. Drinking was evaluated as non-drinker, past drinker, and current drinker. Salt intake was evaluated in three stages (not salty, usual, salty) based on the responses to two questions (whether the diet is generally salty or whether they often eat salt or soy sauce). Diabetes mellitus was recorded as present when a fasting blood sugar level of ≥126 mg/dL or if the patient was taking diabetes medications. Blood tests measured various blood markers, including glucose, HbA1c, triglycerides, gamma-glutamyltransferase (γGT).
CAG was diagnosed if a grossly pale-yellowish color with transparent blood vessels was observed during endoscopy. IM was diagnosed when whitish plaque-like elevations or mucosal nodularity were seen in the antrum and/or body. Additional biopsies were performed if IM was unclear from endoscopy or if a neoplastic lesion was suspected. Histologic IM was assessed based on the updated Sydney system.
H. pylori infection was assessed based on serum H. pylori immunoglobulin G (IgG) as well as H. pylori polymerase chain reaction, rapid urease test, and urea breath test as needed. Infection was assessed if one or more of these tests were positive or if the biopsy tissue confirmed bacteria by Giemsa staining.
The outcome of this study is the follow-up period of each patient in days from the first visit date to the last visit date during the study period (i.e., 2007–2020) or the first diagnosed date of gastric cancer, whichever comes first. For survival analysis, the outcome is labeled as a pair: (time to event, event status). Time to event is the follow-up period in days from the first visit date to the last observation date. The event status is a binary indicator (1 for gastric cancer diagnosis, 0 for censored/no diagnosis). The incidence of gastric cancer was determined based on the result of a stomach biopsy carried out during the endoscopy procedure when gastric cancer is suspected. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study protocol was approved by the Ethics Committee of Seoul National University Hospital (Institutional Review Board No. H-2011-057-1171) and individual consent for this retrospective analysis was waived. We extracted only anonymized data in which patient identification numbers had been removed to prevent individual identification.
Cohort selection
The cohort selection process consists of several steps (Figure 1). First, we filtered out patients who had less than two hospital visits from all patients who visited Seoul National University Hospital Gangnam Center for a regular annual checkup between 2007 and 2020 (n=129,223). Then, we excluded patients with gastric cancer diagnosis history before their first visit and the ones with the following conditions: total gastrectomy, squamous cell carcinoma, neuroendocrine tumor, gastrointestinal stromal tumor, and lymphoma/mucosa-associated lymphoid tissue lymphoma. Lastly, we excluded patients younger than 18 or older than 90 years old on their first visit.
The final cohort consists of 248 patients who were diagnosed with gastric cancer (GC group) during the study period [2007–2020] after their first visit. The rest of the 60,958 patients (censored group) remained without gastric cancer until their observed last visit. Due to the more than one follow-up requirement, the cohort includes more patients with first visits in earlier years, gradually decreasing from 23% in 2007 and 14% in 2008. More than 90% of patients had their first visit before 2017. The average follow-up periods of the GC and the censored groups are 5.1 and 5.7 years, respectively.
Model development
Feature generation and selection
Various types of medical data, such as clinical notes from endoscopy and stomach biopsy, were combined with other structured clinical data, such as patient questionnaires and lab measurements, and were engineered to generate features.
The feature selection effort was guided by two approaches: one based on algorithms and one based on clinical expertise. For the algorithm-based approach, first, any features with more than 90% of missing values were dropped. Next, we ran a univariate Cox analysis to calculate the hazard ratio of each feature and filtered out the features whose P value is greater than a certain threshold value (e.g., P>0.01). We then checked the pairwise correlation among the selected features, and the one with a smaller absolute hazard ratio dropped when the correlation of paired features is greater than 0.9. Apart from this analysis, clinical experts independently evaluated the clinical relevance of features and identified 13 features that have been shown as important factors in prior clinical research. The union of both sets resulted in the final dataset. This feature selection process was designed as a clinically guided dimensionality reduction step informed by prior literature and expert knowledge, rather than as a data-driven optimization aimed at maximizing predictive performance.
Model train, test, and validation
We trained our model with survival-based algorithms, both statistical and ML models. Survival analysis originates from statistics which are developed to predict the time to an event, such as time to death or time to the first diagnosis of a disease (more details in Appendix 1). The key advantage of survival analysis is to handle data censorship, which is a form of missing data problem where the event of interest has not occurred or been observed during the given timeframe for various reasons, such as patients missing their yearly checkup, dropouts, or not developing an event, such as gastric cancer, yet.
Traditionally, the Cox proportional hazard (CPH) regression model has been a standard method for analyzing censored data. The CPH model is a semiparametric algorithm that computes the hazard (i.e., risk) of an event occurring as a linear function of a set of given features of a patient and a population baseline hazard. The accelerated failure time (AFT) is another statistical but fully parametric model. Yet these statistical models come with inherent assumptions about functional forms and often fall short in capturing more complex and non-linear relationships.
As high-dimensional, large-size clinical datasets have become more accessible, researchers started adopting non-parametric ML algorithms for developing disease prediction models to take advantage of their flexibility and capture non-linear relationships. Nevertheless, the introduction of ML models has raised concern, largely because most studies use classification ML algorithms which are not well-suited for censored datasets. Classification ML algorithms predict disease occurrence as a binary outcome within a specific time period of either 5 or 10 years. These algorithms require a follow-up data duration that matches the chosen prediction window for each patient in the study cohort, unless the event is observed, to ensure non-occurrence of the event. However, this requirement introduces substantial concerns regarding prediction biases, particularly in the longer prediction windows, because a large number of patients may be excluded due to dropouts, a situation frequently encountered in observational studies (more discussion in Appendix 1). With this concern, new ML algorithms supporting survival analysis have been developed to explicitly handle censored data. These survival ML algorithms inherit the merit of traditional survival analysis that models the hazard function. These new algorithms offer more flexible alternatives for analyzing high-dimensional data while ensuring appropriate attention to data censorship. For survival-based ML models, we used the following algorithms: Random Survival Forest (RSF), Gradient Boosted Survival (GBS), and Survival Support Vector Machine (SSVM) from the scikit-survival library as well as Extreme Gradient Boosting Survival (XGBS) from the XGBoost library (27,28). In addition to these survival ML models, we also developed CPH and AFT models for benchmarking. The models from the scikit-survival library (RSF, GBS, SSVM) and statistical models (CPH, AFT) require a dataset to be complete with no missing values, thus, we applied mean imputation for numeric features and zero imputation for categorical features for these models. No imputation was made for the XGBS model, as it can handle missing values by design. We opted for a simpler, more transparent approach to maintain a clear focus on the raw data’s predictive value. Accordingly, comparisons across models should be interpreted as method-dependent and reflective of pragmatic model behavior under real-world health check-up data conditions, rather than as results of a strictly controlled benchmarking experiment. The performance is evaluated with the c-index using a 5-repeat, 3-fold cross-validation strategy. We intentionally selected a smaller number of folds, k=3, to maximize the size of the training set used in each iteration, which is crucial given the small absolute number of gastric cancer events in our dataset. This k=3 fold partition also helps ensure a more even distribution of cancer patients across each split. The entire process (the regular 3-fold cross-validation for model training and testing) is then repeated 5 times. This repetition is performed to address the potential variability and stochastic nature of ML algorithms, particularly with a smaller k, improving the overall robustness and stability of our performance estimates. The final performance is the average of the 15 c-index values obtained from 5 iterations of 3-fold cross-validation (29,30). No resampling techniques or class-weighted loss functions were applied to explicitly address outcome imbalance; therefore, model evaluation focused on discrimination rather than sensitivity or absolute risk estimation.
Explainability
To get a better understanding of how the risk scores are determined, we conducted SHAP (31,32). Rooted in cooperative game theory, $\text{SHAP}$ assigns a risk contribution value to each feature based on the model’s output (in our case, the log partial hazard from the survival model). It computes the contribution of that feature to the final prediction by considering all possible combinations of features (coalitions) (31,32). Specifically, SHAP values quantify the extent to which each feature pushes the prediction away from the baseline expectation. By applying $\text{SHAP}$ to the output of our survival model, we can transparently explain why a specific patient’s predicted hazard is higher or lower than the cohort average. This methodology provides both a global understanding of model behavior (via summary plots) and crucial local, individualized explanations (via dependence plots), which are essential for translating the model’s output into actionable clinical insights (see Appendix 1 for more details).
Results
Cohort characteristics
The cohort is selected based on the inclusion and exclusion criteria, and Table 1 shows key summary statistics of features that can inform the characteristics of patients included in the cohort. The average age is 46.0 years old for healthy patients and 53.2 years old for cancer patients. Forty-six point three percent of patients are female in the healthy group, while only 23.4% of cancer patients are female.
Table 1
| Characteristics | Censored group (n=60,958) | GC group (n=248) | All (n=61,206) | P value† |
|---|---|---|---|---|
| Age (years) | 46.0±10.7 | 53.2±8.9 | 46.0±10.7 | <0.001 |
| Gender | <0.001 | |||
| Male | 32,712 (53.7) | 190 (76.6) | 32,902 (53.8) | |
| Female | 28,246 (46.3) | 58 (23.4) | 28,304 (46.2) | |
| BMI (kg/m2) | 23.2±3.2 | 24.3±2.8 | 23.2±3.2 | <0.001 |
| Waist circumference (cm) | 83.5±8.8 | 87.0±7.5 | 83.5±8.8 | <0.001 |
| γGT (IU/L) | 33.4±36.5 | 42.4±36.0 | 33.5±36.5 | <0.001 |
| HbA1c (%) | 5.7±0.6 | 5.9±0.8 | 5.7±0.6 | <0.001 |
| Glucose (mg/dL) | 96.2±16.9 | 102.3±23.1 | 96.2±16.9 | <0.001 |
| Triglycerides (mg/dL) | 110.2±76.0 | 130.6±67.9 | 110.3±75.9 | <0.001 |
| H. pylori infection | 24,534 (40.2) | 165 (66.5) | 24,699 (40.4) | <0.001 |
| Endoscopic findings | ||||
| Chronic atrophic gastritis | 17,850 (29.3) | 167 (67.3) | 18,017 (29.4) | <0.001 |
| Intestinal metaplasia | 6,698 (11.0) | 93 (37.5) | 6,791 (11.1) | <0.001 |
| Comorbidities | ||||
| Diabetes | 3,533 (5.8) | 20 (8.1) | 3,553 (5.8) | 0.13 |
| Hypertension | 12,940 (21.2) | 81 (32.7) | 13,021 (21.3) | <0.001 |
| Hyperlipidemia | 10,641 (17.5) | 44 (17.7) | 10,685 (17.5) | 0.91 |
| Metabolic syndrome | 8,412 (13.8) | 51 (20.6) | 8,463 (13.8) | 0.002 |
| Family history | 6,083 (10.0) | 32 (12.9) | 6,115 (10.0) | 0.13 |
| Smoking | <0.001 | |||
| Never | 27,723 (45.5) | 56 (22.6) | 27,779 (45.4) | |
| Past | 12,501 (20.5) | 67 (27.0) | 12,568 (20.5) | |
| Current | 9,572 (15.7) | 52 (21.0) | 9,624 (15.7) | |
| Drinking | 0.01 | |||
| Never | 15,058 (24.7) | 37 (14.9) | 15,095 (24.7) | |
| Past | 1,671 (2.7) | 7 (2.8) | 1,678 (2.7) | |
| Current | 32,980 (54.1) | 129 (52.0) | 33,109 (54.1) | |
| Salt intake | 0.009 | |||
| Low | 7,116 (11.7) | 15 (6.0) | 7,131 (11.7) | |
| Medium | 19,278 (31.6) | 53 (21.4) | 19,331 (31.6) | |
| High | 6,544 (10.7) | 30 (12.1) | 6,574 (10.7) | |
| Follow-up period (years) | 5.70±3.78 | 5.12±3.20 | 5.70±3.77 | 0.01 |
The numbers represent mean ± standard deviation for continuous variables. For a categorical variable, the number represents the count (%) of the relevant category. †, t-test or Chi-squared test. BMI, body mass index; GC, gastric cancer; H. pylori, Helicobacter pylori; HbA1c, hemoglobin A1c; γGT, gamma-glutamyltransferase.
Gastric cancer model performance & explainability
Model performance
We constructed seven models by incrementally adding features that are clinically known to be related to gastric cancer (Table 2). Model C uses the feature set from Taninaga et al. (23) except post-gastrectomy due to high missingness in our dataset.
Table 2
| Name | Description | Feature list | No. of features |
|---|---|---|---|
| Model A | Basic demographic model + H. pylori infection | Age, gender, BMI, H. pylori infection status | 4 |
| Model B | Evaluates the effect chronic atrophic gastritis | Model A + chronic atrophic gastritis | 5 |
| Model C | Key features from Taninaga et al. [2019] (23)† | Model A | 18 |
| + chronic atrophic gastritis, gastric ulcer, duodenal ulcer, gastroesophageal reflux disease | |||
| + white blood cell counts, neutrophil ratio, lymphocyte ratio, eosinophil ratio, monocyte ratio, basophil ratio, platelet count, haemoglobin, mean corpuscular volume, HbA1c | |||
| Model D | Includes demographic, lifestyle, and clinically relevant lab tests | Model A | 14 |
| + waist circumference, family history, smoking, drinking, salt intake | |||
| + HbA1c, γGT, diabetes, glucose, TG | |||
| Model E | Evaluates the effect chronic atrophic gastritis | Model D + chronic atrophic gastritis | 15 |
| Model F | Evaluates the effect of intestinal metaplasia | Model D + intestinal metaplasia | 15 |
| Model G | Full feature model | Model D + chronic atrophic gastritis, intestinal metaplasia | 16 |
†, Taninaga et al. [2019]’s model includes post-gastrectomy, but it was not included due to the rarity of these cases and the potential for high missingness. BMI, body mass index; H. pylori, Helicobacter pylori; HbA1c, hemoglobin A1c; TG, triglycerides; γGT, gamma-glutamyltransferase.
Figure 2 summarizes the comparative performance of the gastric cancer risk prediction models assessed using Harrell’s c-index. Model performance across different feature sets demonstrates a gradual improvement in discrimination as additional clinically relevant features are incorporated. In particular, the model including both CAG and IM (Model G in Table 2) achieved the highest average c-index of approximately 0.78 across cross-validation folds, supporting the clinical value of integrating comprehensive endoscopic findings. Based on these results, all subsequent analyses were conducted using the feature set defined in Model G.
Figure 2 also compares model performance across different survival modeling algorithms. ML-based survival models, including Extreme Gradient Boosting Survival (XGBS), RSF, and SSVM, demonstrated comparable or slightly higher discrimination than conventional statistical survival models, such as CPH and AFT models. However, the observed differences largely overlapped within the 95% confidence intervals, indicating similar overall performance across modeling approaches.
Model explanation
We computed the SHAP values of the XGBS model (Model G in Figure 2) for a given random partition of the data (more details in Appendix 1). Figure 3 shows the summary plot of SHAP values, a visualization of the overall feature importance of the model. The features are sorted based on the importance to the model prediction. The summary plot clearly reveals that the model prioritizes well-established pathological risk factors. Two well-known risk factors, the presence of CAG and age are the most risk contributing factors in predicting the onset of gastric cancer, followed by H. pylori infection status and IM.
Figure 4 presents SHAP dependence plots with feature distributions for key variables selected in the gastric cancer risk prediction model. Each dependence plot displays the relationship between an individual feature and the model’s predictions, with blue dots representing individual observations and light gray histograms indicating the distribution of feature values in the dataset. These plots allow the identification of both the direction and the magnitude of feature effects, while also revealing complex non-linear risk patterns captured by the XGBS model. For example, the contribution of age to gastric cancer risk shows a pronounced non-linear pattern: risk remains relatively low in individuals in their 20s and 30s, but increases rapidly from the late 30s through the 50s, closely mirroring the age-specific incidence pattern observed in clinical practice.
Figure 5 further illustrates feature interaction effects on model predictions using SHAP interaction plots. By visualizing the SHAP value of one feature across age and encoding the interaction strength of a second feature using color gradients, these plots reveal conditional risk profiles and how the model accounts for feature interdependence. For instance, H. pylori infection is associated with higher gastric cancer risk at younger ages, particularly before the 50s, whereas this association appears attenuated or reversed in older individuals. This finding suggests that the model dynamically modulates the risk contribution of H. pylori according to age and cumulative pathological burden.
Finally, we applied the SHAP analysis to individual patients to get an explanation for individual prediction. We selected a random patient from each of the two risk groups (lower and higher) determined based on their predicted risk scores and generated their waterfall plots (Figure 6). This individual-level explanation demonstrates the practical clinical utility of the model. The lower-risk patient example shows that her risk is relatively lower than the average person in our cohort because she does not have any risk factors or chronic conditions. The example patient in the higher risk is a male patient in 50s, who is generally considered at a higher risk. Also, he has CAG and H. pylori infection, which pushed up his overall risk further. The waterfall plot shows that the positive contributions of CAG and H. pylori are partially counterbalanced by the neutral or negative risk contributions from healthy metrics (e.g., normal blood tests) represented by other features. Also, the model considers that CAG is a more significant risk factor than H. pylori infection based on the interactions with the rest of the features.
Discussion
We developed survival-based ML models for gastric cancer risk stratification by leveraging, to the best of our knowledge, one of the largest cohorts drawn from a predominantly healthy, general screening population. Among the evaluated models, the XGBS model demonstrated discrimination performance comparable to that of conventional statistical survival models and other survival-based ML approaches. We applied the SHAP analysis to the XGBS model to explain the complex relationships between gastric cancer and risk factors, their interactions, as well as the contribution of each factor to individual prediction.
Our results provide insights that are consistent with the accumulated knowledge from epidemiologic clinical studies and the recent efforts of gastric cancer prediction model development. From a clinical standpoint, our models identified H. pylori infection (9,12,33), CAG (34-36), and IM (6,37-39) as the most significant risk factors, which are well-aligned with clinical intuition. From a methodological standpoint, the survival-based ML models, including XGBS, demonstrated discrimination performance comparable to conventional statistical survival models. Consistent with observations in prior studies, tree-based ensemble methods showed robust performance on complex tabular clinical data when compared with deep learning-based approaches (25,40).
Our results also underscore the importance of endoscopic findings in gastric cancer prediction. The comparison between Models D and E (as well as Models A and B) in Figure 2 demonstrates that incorporating endoscopic findings, particularly CAG or IM, significantly improved prediction performance. In fact, CAG and IM can also be assessed through serology or histology. Most studies used serum pepsinogen for CAG or IM diagnoses, as it has an advantage in that it is a non-invasive test. However, serology results can be affected by several other factors, such as H. pylori infection status or the intake of proton pump inhibitors (41-43). Histological diagnosis could be an option, which could improve model performance, as observed in (25). However, it requires multiple biopsies of gastric mucosa, making it difficult to apply to regular clinical practice due to the possibility of bleeding and a time-consuming procedure (41). Endoscopy has the strength of enabling clinicians to directly observe gastric mucosa. In addition, the diagnosis can be confirmed through biopsy if a neoplastic lesion is suspected.
A crucial finding relates to the context of our model’s performance. Note that our models are not the best in terms of performance compared to the published models so far. The c-index of our models are slightly lower (average 0.74–0.78) than the ones reported in Arai et al. [0.76–0.79 for the models without Operative Link on Gastritis-Intestinal Metaplasia Assessment (OLGIM)/Operative Link on Gastritis Assessment (OLGA) stage and 0.79–0.84 for the models with OLGIM/OLGA] (25). We emphasize that this difference is primarily a result of the fundamental distinction between the study cohorts. The cohort utilized by Arai et al. was made up of patients with chronic gastritis who are already considered at a higher risk. Their cohort’s average age (63 years old) is also much higher than the one of ours (46 years old), and the prevalence of gastric cancer among their cohort (8.55%) was also much higher than ours (0.4%). Predicting a rare event in a low-prevalence, general, and asymptomatic population is a significantly more challenging task compared to a high-risk cohort. Thus, the higher c-index could be attributed to these higher risk profiles of their cohort. On the other hand, most patients in our cohort visited this center for regular health checkups while asymptomatic, providing several unique insights that were difficult to obtain from previous studies.
The most notable observations are the ones related to age and its interactions with other features. As most previous studies reported and analyzed the age of onset of gastric cancer, which is around 60 years old with peaks between the ages of 85 and 89 years old (44,45), it was unclear how age plays a role in assessing risk in younger ages, far earlier than potential onset. Because our model assessed risk based on the patient’s age at their first visit, we could observe the effect of age over a much-extended period from younger age compared to other studies. Our results show that the risk certainly increases with age, and the relationship is not linear. As briefly described earlier, age does not make a significant difference until 40s, but the risk starts increasing rapidly until 50s. Then, all patients older than 60 are at a higher risk, and the risk gets more dependent on other conditions.
The interactions of age with other risk factors are also worth noting. For example, the risk of gastric cancer appears to be higher among older patients without H. pylori infection, an observation that could be counterintuitive at a first glance.
Importantly, this finding should be interpreted in the context of a key data limitation related to the definition of H. pylori status in this study. In our feature generation strategy, H. pylori status reflects only current infection at the time of health check-up, without information on prior eradication therapy or treatment success. As a result, individuals classified as H. pylori-negative may represent a heterogeneous group that includes both truly H. pylori-naïve individuals and patients with a history of medically eradicated infection. Because patients with previous H. pylori infection remain at elevated gastric cancer risk even after successful eradication, this misclassification may attenuate or distort the apparent association between current H. pylori status and gastric cancer risk, particularly in older age groups where eradication is more prevalent. An additional, but more speculative, explanation relates to negative conversion of H. pylori infection in the setting of advanced CAG and IM. Prior studies have shown that gastric cancer risk is lowest among individuals without both H. pylori infection and CAG, and highest among those with CAG but without detectable H. pylori, a pattern thought to reflect spontaneous loss of H. pylori in severely atrophic gastric mucosa (46). While this biological mechanism may partially explain the observed age-dependent interaction, the lack of longitudinal data on eradication history prevents definitive differentiation between spontaneous regression and post-eradication status. Accordingly, the observed pattern should be interpreted with caution.
The explanation on individual prediction reveals the complex and non-linear patterns identified by our ML-based model. While CAG, IM, H. pylori infection, and age are the most important features at a model level, we observe that the relative importance and the contribution to the risk of each feature are different for each individual due to the interactions with other features. This level of explicit explanation of how models make their predictions may facilitate clinical interpretability and support future efforts toward clinical translation.
According to national cancer statistics in Korea, the incidence rate of gastric cancer is 57.4 cases per 100,000 population (47), whereas the incidence rate in our research cohort was estimated to be 28.9 cases per 100,000 individuals. Because this study aimed to evaluate the impact of various risk factors on gastric cancer development, individuals diagnosed with gastric cancer at their initial visit (116 cases) or those with a history of gastric cancer (87 cases) were excluded from the analysis. As described in the cohort selection section of the Methods, we focused on individuals who had visited our institution at least twice and had available clinical data prior to gastric cancer development. This selection process may have contributed to a lower incidence rate and could introduce selection bias, which presents an external uncertainty and should be considered a limitation of the study. Moreover, the cohort consisted of individuals undergoing routine health check-ups, who are presumably more health-conscious and relatively healthier than the general population. This may further limit the direct generalizability of our findings. This external uncertainty means that while our model validates capability to assess risk in a low-prevalence setting, risk estimates may need adjustment when applied to higher-risk populations. Consequently, risk estimates derived from this model may require recalibration when applied to higher-risk populations.
Other limitations of this study should also be acknowledged. First, key endoscopic predictors, including CAG and IM, were assessed based on visual endoscopic findings rather than histological confirmation. Although this center has achieved a relatively high level of agreement in endoscopic diagnosis and grading of CAG through continuous education, standardized protocols, and feedback-based quality control (48), visual assessment inherently involves subjectivity. Importantly, the reproducibility of visual endoscopic diagnosis may vary across institutions depending on endoscopist experience, training, and local diagnostic standards. This variability represents an important source of measurement uncertainty and highlights the need for multicenter external validation. This variability constitutes an important source of measurement uncertainty and highlights the need for external validation in multicenter settings, as well as potential integration with standardized classification systems or histologically validated datasets in future studies.
Second, most of the behavioral features, such as smoking and diet, were self-reported by patients and, thus, might be subjective and unreliable, potentially explaining the lower importance of such features despite various clinical trial studies supporting their importance.
Third, the model did not account for prior H. pylori eradication history, which represents a major source of potential confounding. Although information on antimicrobial treatment was available for a subset of participants, the success of eradication therapy and longitudinal infection status could not be reliably ascertained and therefore were not incorporated into the model. Consequently, individuals classified as H. pylori-negative may include both truly uninfected individuals and patients with previously eradicated infection, who are known to remain at elevated gastric cancer risk. This misclassification may attenuate the predictive contribution of the H. pylori feature, particularly in older age groups. Future studies incorporating detailed eradication history and longitudinal infection data are needed to address this limitation.
Fourth, we did not have access to potentially important patient information, such as genetic factors and predispositions that might explain sporadic cancer cases in young, otherwise healthy individuals with no family history and risk factors.
Fifth, our results might not fully reflect the true incidence of gastric and other gastrointestinal cancers, as diagnoses could have occurred at other hospitals. We were unable to use external datasets, such as national cancer statistics, due to strict anonymization protocols that prevent linking with external sources. While our survival prediction models account for this limitation, it is important to consider it when interpreting our findings.
Next, we acknowledge that the high censoring rate in our study may impact the generalizability of our findings. However, our adoption of survival-based ML models (XGBS, RSF) is a direct approach designed to handle the unknown non-parametric uncertainty associated with censored time-to-event data. Although we conducted rigorous internal validation to ensure the robustness of our analysis, we advise caution in interpreting the results.
Lastly, our study had a single-center retrospective design, limiting our findings' generalizability to different settings, such as the western population with different lifestyles, eating habits, and gastric cancer incidence.
Regarding the clinical implementation of our model, the handling of missing input data—a common occurrence in real-world settings—is streamlined by the model's design. As detailed in the Methods, the XGBS model is capable of processing incomplete datasets without requiring manual imputation, thereby reducing the preprocessing burden on users. However, given the reliance on endoscopic findings, users should possess sufficient expertise to accurately interpret and input these clinical features, as the model's reliability depends on the quality of these initial assessments
Conclusions
This study demonstrates the feasibility of applying explainable survival-based ML models for gastric cancer risk stratification in a general screening population. However, further calibration assessment, external validation, and prospective evaluation are required before these models can inform clinical decision-making or risk-adapted surveillance strategies.
Acknowledgments
An abstract of this study was previously presented as a poster at the 6th Edition of International Cancer Conference in 2023 and at the 37th Workshop of the European Helicobacter and Microbiota Study Group in 2024.
Footnote
Reporting Checklist: The authors have completed the STROBE reporting checklist. Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2026-0245/rc
Data Sharing Statement: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2026-0245/dss
Peer Review File: Available at https://jgo.amegroups.com/article/view/10.21037/jgo-2026-0245/prf
Funding: This study was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jgo.amegroups.com/article/view/10.21037/jgo-2026-0245/coif). J.H.S., S.W., Y.S.K., S.Y.Y., and H.Y.K. report that Enolink Inc. provided the general funding to the investigator/author’s institution as research & development fees to execute the research. S.W., M.K., and S.L. are employees of Enolink Inc., the funder of the study and were involved in analysis and the preparation of the manuscript. The other author has no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study protocol was approved by the Ethics Committee of Seoul National University Hospital (Institutional Review Board No. H-2011-057-1171) and individual consent for this retrospective analysis was waived. We extracted only anonymized data in which patient identification numbers had been removed to prevent individual identification.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
- Namasivayam V. Endoscopic screening and surveillance for gastric cancer: challenges and opportunities. Fac Rev 2023;12:17. [Crossref] [PubMed]
- Wang Y, Liu Z, Li W, et al. Gastric cancer in China: Epidemiology, risk factors, and screening. Chin J Cancer Res 2025;37:937-48. [Crossref] [PubMed]
- Sung SY, Choi HH, Sin SH, et al. Optimal interval of screening endoscopy for reducing gastric cancer mortality: a nationwide cohort study. Gastrointest Endosc 2026;103:725-32. [Crossref] [PubMed]
- Chang Y, Cho B, Son KY, et al. Determinants of gastric cancer screening attendance in Korea: a multi-level analysis. BMC Cancer 2015;15:336. [Crossref] [PubMed]
- Laszkowska M, Hahn AI, King S, et al. The Cost-Effectiveness of Gastric Cancer Screening and Surveillance Among Average-Risk and Risk-Stratified Populations. Clin Gastroenterol Hepatol 2026;24:1568-79. [Crossref] [PubMed]
- Hamashima C. Benefits and harms of endoscopic screening for gastric cancer. World J Gastroenterol 2016;22:6385-92. [Crossref] [PubMed]
- Correa P, Haenszel W, Cuello C, et al. A model for gastric cancer epidemiology. Lancet 1975;2:58-60. [Crossref]
- Kuipers EJ. Review article: exploring the link between Helicobacter pylori and gastric cancer. Alimentary Pharmacology & Therapeutics 1999;13:3-11.
- Kumar S, Patel GK, Ghoshal UC. Helicobacter pylori-Induced Inflammation: Possible Factors Modulating the Risk of Gastric Cancer. Pathogens 2021;10:1099. [Crossref] [PubMed]
- Patel AK, Sethi NS, Park H. Gastric Cancer: A Review. JAMA 2026;335:439-50. [Crossref] [PubMed]
- Yoon H, Kim N, Lee HS, et al. Helicobacter pylori-negative gastric cancer in South Korea: incidence and clinicopathologic characteristics. Helicobacter 2011;16:382-8. [Crossref] [PubMed]
- Uemura N, Okamoto S, Yamamoto S, et al. Helicobacter pylori infection and the development of gastric cancer. N Engl J Med 2001;345:784-9. [Crossref] [PubMed]
- Holleczek B, Schöttker B, Brenner H. Helicobacter pylori infection, chronic atrophic gastritis and risk of stomach and esophagus cancer: Results from the prospective population-based ESTHER cohort study. Int J Cancer 2020;146:2773-83. [Crossref] [PubMed]
- Yusefi AR, Bagheri Lankarani K, et al. Risk Factors for Gastric Cancer: A Systematic Review. Asian Pac J Cancer Prev 2018;19:591-603. [Crossref] [PubMed]
- Thrift AP, Nguyen TH. Gastric Cancer Epidemiology. Gastrointest Endosc Clin N Am 2021;31:425-39. [Crossref] [PubMed]
- Praud D, Rota M, Pelucchi C, et al. Cigarette smoking and gastric cancer in the Stomach Cancer Pooling (StoP) Project. Eur J Cancer Prev 2018;27:124-33. [Crossref] [PubMed]
- Yaghoobi M, Bijarchi R, Narod SA. Family history and the risk of gastric cancer. Br J Cancer 2010;102:237-42. [Crossref] [PubMed]
- Jung YS, Xuan Tran MT, Park B, et al. Association Between Family History of Gastric Cancer and the Risk of Gastric Cancer and Adenoma: A Nationwide Population-Based Study. Am J Gastroenterol 2022;117:1255-63. [Crossref] [PubMed]
- Murphy N, Jenab M, Gunter MJ. Adiposity and gastrointestinal cancers: epidemiology, mechanisms and future directions. Nat Rev Gastroenterol Hepatol 2018;15:659-70. [Crossref] [PubMed]
- Guo J, Liu C, Pan J, et al. Relationship between diabetes and risk of gastric cancer: A systematic review and meta-analysis of cohort studies. Diabetes Res Clin Pract 2022;187:109866. [Crossref] [PubMed]
- Zheng J, Gao Y, Xie SH, et al. Haemoglobin A1c and serum glucose levels and risk of gastric cancer: a systematic review and meta-analysis. Br J Cancer 2022;126:1100-7. [Crossref] [PubMed]
- Taninaga J, Nishiyama Y, Fujibayashi K, et al. Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study. Sci Rep 2019;9:12384. [Crossref] [PubMed]
- Lee E, Jung SY, Hwang HJ, et al. Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation. JMIR Med Inform 2021;9:e29807. [Crossref] [PubMed]
- Arai J, Aoki T, Sato M, et al. Machine learning-based personalized prediction of gastric cancer incidence using the endoscopic and histologic findings at the initial endoscopy. Gastrointest Endosc 2022;95:864-72. [Crossref] [PubMed]
- Wang P, Li Y, Reddy CK. Machine Learning for Survival Analysis: A Survey. ACM Computing Surveys 2019;51:110.
- Polsterl S. scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn. J Mach Learn Res 2020;21:212.
- Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016;11:785-94.
- Efron B, Tibshirani R. Improvements on Cross-Validation: The .632+ Bootstrap Method. Journal of the American Statistical Association 1997;92:548-60.
- Dietterich TG. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput 1998;10:1895-923. [Crossref] [PubMed]
- Molnar C. Interpretable machine learning: Lulu.com; 2020.
- Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems 2017;30:4768-77.
- Kato S, Matsukura N, Tsukada K, et al. Helicobacter pylori infection-negative gastric cancer in Japanese hospital patients: incidence and pathological characteristics. Cancer Sci 2007;98:790-4. [Crossref] [PubMed]
- Correa P. Human gastric carcinogenesis: a multistep and multifactorial process--First American Cancer Society Award Lecture on Cancer Epidemiology and Prevention. Cancer Res 1992;52:6735-40.
- Burke E, Harkins P, Arumugasamy M. Gastric atrophy and gastric cancer: a meta-analytical assessment of risk and the influence of topographical distribution. Eur J Gastroenterol Hepatol 2026; [Crossref]
- Gu J, Chen R, Wang SM, et al. Prediction Models for Gastric Cancer Risk in the General Population: A Systematic Review. Cancer Prev Res (Phila) 2022;15:309-18. [Crossref] [PubMed]
- Jencks DS, Adam JD, Borum ML, et al. Overview of Current Concepts in Gastric Intestinal Metaplasia and Gastric Cancer. Gastroenterol Hepatol (N Y) 2018;14:92-101.
- Attieh P, Al Hazzouri A, Al Qassab M, Mansour E, Rizk N, Karam K, et al. Gastric intestinal metaplasia: Management and surveillance strategies. World J Gastrointest Pathophysiol 2026;17:118156.
- Rugge M, Correa P, Dixon MF, et al. Gastric dysplasia: the Padova international classification. Am J Surg Pathol 2000;24:167-76. [Crossref] [PubMed]
- Borisov V, Leemann T, Sebler K, et al. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans Neural Netw Learn Syst 2024;35:7499-519. [Crossref] [PubMed]
- Lee JY, Kim N, Lee HS, et al. Correlations among endoscopic, histologic and serologic diagnoses for the assessment of atrophic gastritis. J Cancer Prev 2014;19:47-55. [Crossref] [PubMed]
- Di Mario F, Ingegnoli A, Altavilla N, et al. Influence of antisecretory treatment with proton pump inhibitors on serum pepsinogen I levels. Fundam Clin Pharmacol 2005;19:497-501. [Crossref] [PubMed]
- Yoon H, Kim N. Diagnosis and management of high risk group for gastric cancer. Gut Liver 2015;9:5-17. [Crossref] [PubMed]
- Ning FL, Lyu J, Pei JP, et al. The burden and trend of gastric cancer and possible risk factors in five Asian countries from 1990 to 2019. Sci Rep 2022;12:5980. [Crossref] [PubMed]
- Korean Gastric Cancer Association Nationwide Survey on Gastric Cancer in. 2014. J Gastric Cancer 2016;16:131-40.
- Ohata H, Kitauchi S, Yoshimura N, et al. Progression of chronic atrophic gastritis associated with Helicobacter pylori infection increases risk of gastric cancer. Int J Cancer 2004;109:138-43. [Crossref] [PubMed]
- Kang MJ, Won YJ, Lee JJ, et al. Cancer Statistics in Korea: Incidence, Mortality, Survival, and Prevalence in 2019. Cancer Res Treat 2022;54:330-44. [Crossref] [PubMed]
- Jin EH, Chung SJ, Lim JH, et al. Training Effect on the Inter-observer Agreement in Endoscopic Diagnosis and Grading of Atrophic Gastritis according to Level of Endoscopic Experience. J Korean Med Sci 2018;33:e117. [Crossref] [PubMed]

