Clinicians who order common diagnostic chest x-rays for patients have been sitting on a goldmine of unused prognostic information. The radiographs, used since the 19th century to detect specific abnormalities, could soon be repurposed to identify long-term mortality risk — with a little help from machine learning.
Using data from two large randomized trials, researchers have developed a convolutional neural network, called CXR-risk, that stratifies participants by all-cause mortality risk. They trained the artificial intelligence (AI) system with 85,000 x-rays and follow-up data from more than 40,000 individuals. Extracting information from single chest radiographs, the system found a graded association between risk score and mortality.
“Based on the chest x-ray image alone, AI identified people at up to a 53% risk of death over 12 years,” lead author Michael T. Lu, MD, MPH, of the Cardiovascular Imaging Research Center, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, told Medscape Medical News. Deaths were most often due to heart disease and lung cancer.
“We get chest x-rays to make a diagnosis like pneumonia,” Lu continued, “but our study shows that there is also free prognostic information about health and longevity on the images and that AI can extract this information to predict who will be alive 12 years later.”
He hopes that scores calculated using AI may incentivize high-risk individuals to lower their chance of dying with prevention, regular screening, and lifestyle modification.
The researchers stratified individuals into quintiles by CXR-risk score. For persons with a very high CXR-risk score, the mortality rate was 53.0% (250 deaths among 472 individuals) at 12.2 years’ follow-up in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening (PLCO) Trial; it was 33.9% (61/180) at 6.3 years in the National Lung Screening Trial (NLST).
In unadjusted analyses, those rates translated, respectively, to an 18.3-fold (95% confidence interval [CI], 14.5 – 23.2, P < .001) and a 15.2-fold (95% CI, 9.2 – 25.3, P < .001) greater probability of death compared with individuals who had a very low CXR-risk score.
By score, PLCO mortality rates were 3.8% (97 deaths among 2543 participants) in the very-low-risk group, 7.8% (216 of 2769) in the low-risk group, 12.7% (339 of 2674) in the moderate-risk group, and 24.9% (500 of 2006) in the high-risk group.
NLST mortality rates were similar at 2.7% in the very-low-risk group (20 of 752), 3.8% in the low-risk group (64 of 1679), 6.7% in the moderate-risk group (115 of 1723), and 9.8% in the high-risk group 9.8% (114 of 1159).
As for application to day-to-day practice, it is feasible that routine diagnostic x-rays could be fairly easily uploaded to dedicated websites for AI risk analysis. But, Lu cautioned, “While the technology is here, we need clinical trials to prove that this information helps decision making and improves health.”
In addition, he noted that it is not clear how many patients would want to know their 12-year risk for death.
To develop and test the algorithm, Lu and colleagues used data from the screening radiography arm of the PLCO (n = 52,320), a community-based cohort of asymptomatic nonsmokers and smokers aged 55 to 74 years. Participants were enrolled at 10 US sites from November 1993 through July 2001. For external testing, the researchers analyzed data from the screening radiography arm of the NLST (n = 5493), a community cohort of heavy smokers aged 55 to 74 years who were enrolled at 21 US sites from August 2002 through April 2004.
The researchers used data from 41,856 PLCO participants to train the system. They then tested the algorithm on the remaining 10,464 PLCO participants and the 5493 NLST participants.
Among participants from the PLCO test group, the mean age was 62.4 years, 5405 (51.6%) were men, and 86.5% were white. Among the 5493 participants from the NLST, the mean age was 61.7 years, 3037 (55.3%) were men, and 92.9% were white.
Data were analyzed for all-cause mortality, the primary endpoint, and for cause-specific mortality from January 2018 to May 2019.
After adjusting for radiologists’ findings and other risk factors, including age, sex, and comorbidities, PLCO participants in the highest quintile for CXR score had nearly a fivefold higher risk for death compared with those in the lowest quintile (adjusted hazard ratio [aHR], 4.8; 95% CI, 3.6 – 6.4, P ≤ .001). The risk was sevenfold higher for those in the NLST (aHR, 7.0; 95% CI, 4.0 – 12.1, P ≤ .001).
The authors note that the CXR-score is also significantly associated with cause-specific deaths in both test populations. Specifically, for lung cancer death, the aHR was 11.1 for in the highest-score quintile vs the lowest quintiles in the PLCO and 8.4 in the NLST. For cardiovascular death, the aHR was 3.6 and 47.8 in the PLCO and te NLST, respectively. For respiratory deaths, the aHR was 27.5 and 31.9.
The authors caution that the system was tested among mainly white, asymptomatic persons aged 55 to 74 years for whom screening posterior-anterior chest radiographs were available and that its prognostic value needs to be assessed in other demographic groups.
“Study findings by Lu et al highlight one of the allures of deep learning: the prospect of identifying patients at risk for adverse outcomes and then trying to avoid that adverse outcome,” write Surafel Tsega, MD, of Icahn School of Medicine at Mount Sinai in New York City, and Hyung J. Cho, MD, of New York City Health and Hospitals, in an invited commentary.
They warn, however, of a gap between AI-aided prediction and real-life prevention. “[W]hat use is this prediction if we do not yet know what to do with this information? What is a worthy preventive strategy?” they ask. In their view, the study by Lu and colleagues highlights “the gulf between developing a scientifically sound algorithm and its use in any meaningful real-world applications.”
While acknowledging that deep learning could improve clinical judgment and care, hyperbole surrounding the potential of AI “should be tempered by the reality that the technology we have thus far is not nearly as ambitious,” the commentators write. Preventing adverse outcomes through AI remains, therefore, a distant possibility.
“We should not simply be intoxicated by the idea of what we can do but must be clear-headed about what is worth knowing and is worth doing,” they write.
The Novidia Corporation Academic Program donated a graphics processing unit as an unrestricted gift. Lu reported research funding to his institution from Kowa Company Limited and MedImmune and personal financial support from PQ Bypass, the American Heart Association, Precision Medicine Institute, and the Harvard University Center for AIDS Research outside the submitted work. Coauthors also reported various financial ties to private and nonprivate entities. Tsega and Cho have disclosed no relevant financial relationships.