Abstract
Oesophagectomy for cancer of the oesophagus carries significant morbidity and mortality. Ninety-day mortality and anastomosis leakage are critical early postoperative problems traditionally analysed through logistic regression. In this study, we challenge traditional logistic regression models to predict results with new explainable AI (XAI) models. We used the Swedish National Quality Register for Oesophageal and Gastric Cancer (NREV) to perform traditional multivariable logistic regression and XAI. The 90-day mortality was 6.0%, while anastomosis leakage was present in 12.4%. The XAI models yielded an area under the curve (AUC) of 0.91 for 90-day mortality (as compared with 0.84 for logistic regression). For anastomosis leakage, the AUC was 0.84 using XAI (0.74 using logistic regression). We show that age (mortality increases sharply after 55 years) and body mass index (BMI) (lowest mortality for BMI 30 kg/m2) are important survival factors. Additionally, we show that surgery time (minimum anastomosis leakage for a surgery time of 200 min to sharply increase to a maximum at 375 min) and BMI (the lower the BMI, the less anastomosis leakage) are important factors for anastomosis leakage. The surgical understanding of anastomosis leakage and mortality after oesophagectomy is advanced by judiciously applying XAI to structured data. Our nationwide oesophagectomy data contains significant nonlinear relationships. With the help of XAI, we extract personalised knowledge, bringing oesophagus surgery one step closer to personalised medicine.
Introduction
Oesophagectomy is often performed as a curative measure for oesophageal cancer. It is an extensive surgical procedure. The surgical techniques have evolved, and most of the procedures performed today are more or less minimally invasive. Anastomotic techniques have also developed, and most anastomoses nowadays are performed with circular or linear stapling devices. Perioperative care has also dramatically changed with enhanced recovery programs [1, 2, 3, 4].
Despite these improvements, oesophagectomy still carries a high rate of anastomosis leakage (AL) of 10-15% and a 90-day mortality (D90) rate of 2-5%. AL can be a severe complication that may need complex interventions and increase the risk for D90. To improve the outcome of oesophagectomy, it is necessary to understand which factors affect AL and D90. Modifiable factors are especially important in this context. However, age and other patient-specific features are also important in understanding which patients should or should not be offered oesophagectomy.
Traditional logistic regression models are well-suited for simple relationships and are widely used for predictions in medicine. Basic assumptions that must be met for logistic regression include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers. As artificial intelligence (AI) continues to integrate into various sectors of society, including advancements in medical research, the pros of more refined prediction models for mortality and anastomosis leakage have become increasingly promising. XGBoost is a recent tool in the AI toolbox that has won several international awards. Shapley scores provide a new way of presenting results, and the two methods combined offer flexibility, predictive performance, interpretability for handling nonlinear relationships, complex interactions, and great visualisations in scientific settings. Together, XGBoost and Shapley scores offer what can be called explainable artificial intelligence (XAI). This study aims to improve the prediction of 90-day mortality and anastomosis leakage after oesophagectomy in a Swedish cohort by using XAI with XGBoost and Shapley scores.
Methods
Patients
Data was collected from the National Quality Register for Esophageal and Gastric Cancer in Sweden (NREV). Survival data is automatically transferred to NREV from Statistics Sweden. NREV is well-described, researched and validated [5, 6].
Patients with oesophageal cancer were selected between November 2005 and February 2018, and of these, 1846 patients underwent oesophagectomy for oesophageal cancer. One hundred forty perioperative variables not directly linked with the outcome were selected [7]. All data was extracted on the 11th of March 2020. The study was approved by the Regional Ethical Board of Stockholm (Dnr 2013/596–31/3, amendment: 2020-06495).
Statistics
We used R for all calculations and graphs. Anastomotic leakage rate and 90-day mortality were modelled using traditional logistic regression and XAI. In the analyses, the predicted event was indicated as 1 (death within 90 days or anastomosis leakage). Anastomosis leakage was defined as a full thickness gastrointestinal defect involving esophagus, anastomosis, staple line, or conduit irrespective of presentation or method of identification with required intevention, surgical or drainage (type II & III) [8]. We excluded variables with more than 20% missing values to ensure the robustness of our analyses. In the remaining dataset, we employed multiple imputations by chained equations with the random forest imputation method using the MICE (version 3.16.0) package of R.
All pre- and perioperative variables were used for the logistic regressions, and backward elimination was performed separately for 90-day mortality and anastomosis leakage, retaining the variables that best explained these models. These remaining variables were then used in the logistic regression model.
For the XAI, we used the eXtreme Gradient Boosting (XGBoost) package (version 1.7.5.1). XGBoost uses decision trees through gradient boosting [9]. We divided our dataset into a 90% training set for model training and a 10% test set for evaluation. Several hyperparameters control XGBoost: the number of decision trees, training rounds, and the learning rate. These hyperparameters must be set before training the algorithm since they significantly impact performance [10]. We used cross-validation to select the number of training rounds and grid search to maximise the algorithm’s accuracy. The final hyperparameters were learning rate 0.1, subsample 0.3, colesample bynode 0.3, reg lambda 6, maximum depth 50, evaluation metric AUC, objective binary logistic.
To assess the model accuracy, receiver operating characteristic (ROC) curves were generated, and the area under the receiver operatic characteristic curve (AUC) was calculated. Differences in AUCs were assessed with the test of DeLong et al.[11]. Additionally, for better interpretability of the XGBoost results, we reported them using Shapley Additive exPlanations (SHAP) [12]. Developed by Lundberg and Lee (2017), SHAP provides a unified and theoretically grounded framework for feature importance analysis, whether traditional logistic regression or advanced machine learning. The contribution of each feature is presented as the absolute mean from each SHAP value. In binary prediction, SHAP values equal the log odds in the regression model [13].
Results
90-day mortality
The overall 90-day mortality (D90) was 6 %. The ROC curve for D90 showed improved predictions for the XGBoost model with an AUC of 0.91 compared to 0.84 for the logistic regression model (p-value <0.05) (See Figure 1). In logistic regression, age was associated with an increased risk of D90 (odds ratio (OR) 1.03 (1.0-1.16)) positive lymph nodes (OR 1.07 (1.03-1.11)), and patients categorised as American Society of Anesthesiologists (ASA) grade 3 had an increased risk for mortality (OR 4.20 (2.08 - 8.69), (3.4 % missing data for ASA scores) while increasing body-mass index (BMI) showed a decreased risk (0.92 (0.87-0.98)). Neither bleeding, surgery time, nor investigated lymph nodes showed any significant association with D90 in the logistic regression model (See Table 1).
The XAI analysis found that the most important feature predicting D90 was age (absolute mean 0.36), followed by BMI and bleeding (Absolute mean 0.28, respectively 0.23) (see Figures 2, and 5). Higher age predicted D90, while higher BMI was protective. Females generally had a lower BMI than males. Increased surgery time, the number of positive lymph nodes, and higher ASA grade were also associated with a greater risk of D90. BMI was protective mainly among ASA 1 and 2 while increasing BMI among ASA 3 patients was associated with a greater risk of D90.
The contribution of different factors to D90 (and AL) are shown for some example patients in Figure 4.
Anastomosis leakage
Altogether, 229 patients (12.4 %) developed anastomosis leakage (AL). The AUC for the XGBoost model was 0.84, compared to the AUC of the logistic regression model, which was 0.74 (p-value <0.05) (See Figure 1). In the logistic regression model, ASA 3 patients had a significantly increased risk of developing AL (OR 2.88 (1.69 - 4.25)). No other significant feature was associated with AL (See Table 1).
The XAI analysis results are presented in Figures 3 and 6. The most important feature predicting AL was surgery time (absolute mean 0.19), where the risk for AL peaked around 400 minutes and again declined after that (Figure 3). Higher BMI and an increasing number of investigated lymph nodes were associated with an increased risk of AL (absolute mean 0.16, respectively 0.12). Increased age, the number of investigated lymph nodes, and higher ASA grade were also associated with a greater risk of D90. However, the relationship was not linear, and the risk of AL peaked at around seventy years of age and declined after that. The association with investigated lymph nodes was not linear (see Figure 6).
The contribution of different factors to AL (and D90) are shown for some example patients in Figure 4.
Discussion
In this retrospective register-based study, we analysed whether XAI can bring further knowledge in predicting D90 and AL. We have shown that our XAI models are considerably better than our traditional logistic regression models. Further, our model is also better than previous D90 prediction models [14]. Our prediction was better for D90 than AL, although AL had a reasonable AUC. This implies that AL can have other more influential risk factors not accounted for by our variables. Previous medical studies have also shown improved AI prediction models compared to logistic regression models [15, 16].
To elucidate the importance of individual factors on D90 and AL, we decided to employ Shapley scores, a game theoretic approach that can be used with any model, be it AI or logistic, to introduce interpretability. To illustrate the additive contribution of Shapley scores, we have included example Waterfall plots (See Figure 4). This figure exemplifies the relative importance of different factors to D90 and AL for two patients. The choice of XGBoost among many available AI models rests on its proven track record in AI competitions for structured data such as ours.
The data source of this study comprises 100 % of the Swedish operating hospitals for oesophageal cancer and has a more than 95 % coverage rate for the surveys. The completeness and correctness of the surveys are high [5]. A validation program to optimise the completeness and correctness of NREV data, including site visits, has been ongoing for a couple of years.
90-day mortality
In previous studies, age and comorbidities have been associated with increased risk of D90 [14, 17], but the XAI demonstrates that age, BMI, bleeding, surgery time, no. of positive lymph nodes and ASA are the most important factors (in falling order) for D90 (see Figure 2).
The XAI highlights how mortality is low for patients less than 55 years old and increases to a maximum for over 75 rapidly. In the logistic regression analyses, BMI exhibited a reduced risk for D90 but was not significant for AL.
Our XAI models show a strong contribution of BMI for both D90 and AL (absolute mean Shapley scores 0.28, respectively 0.16). The Shapley scores show a decreased risk for D90 with a higher BMI (to reach a minimum at BMI 30 kg/m2). This aligns with a previous meta-analysis by Mengardo et al. [18]. However, further analysis of D90 and BMI shows an interaction with ASA insofar as higher BMI is associated with lower D90 for ASA 2 and ASA 3 patients, whereas increased BMI for ASA 1 is associated with higher D90.
Anastomosis leakage
In previous studies, comorbidities and BMI have been associated with increased risk of AL [18], but the XAI demonstrates that surgery time, BMI, age, year of surgery, investigated lymph nodes, and ASA were the most important factors (in falling order) for AL (see Figure 3).
Increasing surgery time is the most important factor for AL. It is lowest for 200 min to rapidly reach a maximum for 350 min to again decrease to a lesser degree until 600 min surgery duration.
The logistic regression analyses BMI showed no significant association for AL, but the Shapley scores for the XAI method showed a high contribution to the model and an increased risk of AL with increasing BMI (to reach a maximum at BMI 35 kg/m2), in line with a Mengardo et al. [18].
Since participation in the NREV quality register is not mandatory but highly recommended, results from the first few years may carry some bias. An interesting example of this phenomenon was that the risk for anastomosis leakage increased dramatically from low reported values between 2005 and 2008, and later on, it still increased but at a much slower rate. A transiently high risk was seen in 2015. The slow increase in risk during the last 10 reported years could be attributed to the fact that the surgical centres have become more active in identifying anastomosis leakages. Enhanced Recovery Programmes (ERP) were introduced with new postoperative routines. Endoscopic and improved radiological evaluations were introduced more liberally, and some previously missed subclinical leaks were probably reported. The introduction of laparoscopic and robotic techniques during the study period might impact the increased frequency of anastomotic leakage.
The significantly better predictions with the XAI method result from complicated nonlinear dependencies of the covariates, validating the use of a much more complex and less intuitive model than classical multivariable logistic regression. An illustrative instance of this phenomenon is BMI, which holds significant predictive value for D90 and AL. Notably, a BMI of 30 emerges as optimal for reducing the risk of D90, with lower and higher BMI values correlating with increased risk (see Figure 2).
A strength of our study is that we employ a high-quality national registry with a high coverage rate and few missing data points [5]. Previous NREV studies on long-term outcomes after oesophagectomy showed the impact of sex, education level, and geographical differences within the country [6, 19, 20]. In this study, with early postoperative follow-up, we evaluated the impact of sex and, to some extent, geographical difference by adjustments for operating hospitals, factors in this study that were of limited impact.
A weakness of this registry is that, for some reason, it has low coverage of tobacco use, which, therefore, was excluded from the analysis. The impact of detailed oncological therapy was also limited in this study. This was compensated by the fact that patients who underwent oesophagectomy during the study period all had perioperative oncological treatment according to international recommendations for their tumour stages.
A possible extension to the machine learning methodology would be to develop formal testing methods for Shapley scores, which is beyond the scope of the present paper.
Conclusion
To summarise, we have advanced the knowledge of risk factors for 90-day mortality and anastomosis leakage after oesophagectomy by using explainable AI (XAI). To mention the main findings, we show that age (mortality increases sharply after 55 years) and BMI (lowest mortality for BMI 30 kg/m2) are important survival factors. Additionally, we show that surgery time (minimum anastomosis leakage for a surgery time of 200 min to sharply increase to a maximum at 375 min) and BMI (the lower the BMI, the less anastomosis leakage) are important factors for anastomosis leakage. In a more general sense, we advance the surgical understanding of anastomosis leakage and mortality after oesophagectomy by judiciously applying XAI to structured data. Our nationwide oesophagectomy data contains significant nonlinear relationships. With the help of XAI, we extract personalised knowledge, bringing oesophagus surgery one step closer to personalised medicine.
Data Availability
All data produced in the present work are contained in the manuscript
Competing interests
No competing interest is declared.
Author contributions statement
JJ conceived and funded the study. SD conducted all the calculations advised by AF. SD and AF wrote the first draft of the manuscript. All authors read and revised the final manuscript.
Funding
AF: Regional research support, Region Skåne #2022-1284; Governmental funding of clinical research within the Swedish National Health Service (ALF) #2022:YF0009 and #2022-0075; Crafoord Foundation grant number #2021-0833; Lions Skåne research grants; Skåne University Hospital grants; Swedish Heart and Lund Foundation (HLF) #2022-0352 and #2022-0458.