Although recent progress in precision medicine applies to cancer therapy, lung cancer remains the leading cause of cancer-related mortality worldwide, including East Asia (1). There are various treatment strategies according to stage, tumor site, histology and genetic alteration. Many hospitals constitute multidisciplinary teams (MDT) consisting of oncologists, surgeons, radiologists, radiation oncologists, pathologists and palliative care specialists to make optimal decisions for patients with lung cancer (2). In Korea, the number of newly diagnosed lung cancer cases is steadily increasing (3-5), decreasing the time that doctors can dedicate to learning. On the other hand, new drugs, medical data, papers and guidelines for lung cancer are rapidly growing. Thus, even oncologists who are experts in a specialized field cannot master all available knowledge. Recent advances help oncologists quickly identify key information in a patient’s medical record, surface relevant evidence and explore treatment options (6-8). Artificial intelligence (AI) is a general term which denotes the use of a computer system to model intelligent processes so as to lower human intervention (9). AI systems in oncology acquires knowledge from large medical datasets guidelines. They, then, use computational reasoning approaches to apply it to a specific case, and generate insights for clinicians (10).
Watson for Oncology (WFO, IBM Watson Health, Cambridge, MA) is a clinical decision-support system (CDSS) for treatment of lung cancer, breast cancer and prostate cancer at Memorial Sloan Kettering Cancer Center (MSKCC) in March 2012 (11-15). WFO was introduced in South Korea in 2017, and has been assisting clinicians at several cancer centers. WFO stores and indexes literature, protocols, and patient charts. It learns from test cases and all the information input is verified by the experts from MSKCC. Moreover, WFO data are updated to the latest cutting-edge information every 1 to 2 months. When we input a case that is not supported by WFO, the system does not recommend a treatment plan. In case of lung cancer, patients with isolated metastatic tumors and those with driver mutations whose cancer progresses during metastatic therapy are not yet supported by WFO. With its growing popularity, many clinicians are questioning whether WFO is suitable for cancer management. Cancer patients also worry about receiving treatment recommendations from WFO. The complexity of lung cancer treatment tends to decrease the consistency of recommendations from WFO in comparison to those from the MDT. There are only a few published studies investigating the reliability of WFO in Manipal Comprehensive Cancer Center (11) and China (16,17).
We conducted this study to assess the agreement between WFO and MDT at a single cancer center in South Korea. The objective of this study was to determine the level of recommended treatment concordance in lung cancer cases.
Study design and population
We conducted a retrospective study on 405 cases of lung cancer and compared the degree of agreement of initial treatment recommendations for lung cancer cases, between WFO and an MDT at the Chonnam National University Hwasun Hospital (CNUHH) in South Korea.
The inclusion criteria for this study were as follows: (I) patients with primary lung cancer; (II) those who were admitted between January 2018 and December 2018; (III) those who did not receive antitumor treatment. The exclusion criteria were as follows: (I) secondary lung cancer which has a distant site of origin (II) those who had previously received antitumor treatment. Patients who received only confirmed diagnosis and not any antitumor treatment were also included.
Multidisciplinary tumor board for lung cancer
The MDT for lung cancer was composed of pulmonary oncologists, radiation oncologists, thoracic surgeons, neurosurgeons, radiologists, pathologists and nuclear medicine doctors. After new cases of lung cancer were confirmed, their medical records were reviewed at MDT conferences twice every week. They classified the treatment decision in four categories: surgery, radiotherapy, concurrent chemoradiotherapy (CCRT), and medical treatments including chemotherapy, targeted therapy, immunotherapy.
WFO supports doctors in making lung cancer treatment decisions using a curated body of knowledge including text from more than 300 medical journals and textbooks, MSKCC treatment guidelines, and literature hand-selected by MSKCC experts (11). For supported cases, WFO analyzed medical records and provided treatment plans in three categorized groups with corresponding labels: ‘recommended treatments’ with green label (as a strong base of evidence), ‘for consideration’ with amber label (as appropriate alternatives based on their clinical judgment), and ‘not recommended’ with green label (as specific contraindications or strong evidence against their use). WFO version 18.4 was used in this study.
Data collection & statistical analysis
Patient’s data were abstracted from electronic medical records and entered manually into WFO by one trained fellow. The MDT tumor board had previously reviewed and recommended treatment regimens for all new cases in 2018. WFO analyzed the same cases along with their clinical information in 2019. WFO and the physicians who ran the cases were blinded to the treatment recommendations that had been made by the MDT.
Treatment recommendations were considered concordant if the tumor board’s recommendation corresponded to the ‘recommended’ categories. If WFO suggested two or more plans as ‘recommended’ categories and MDT’s plan corresponded to one of them, it was regarded as concordant. We analyzed concordance rate by cancer stage and histology.
Statistical analysis was performed with IBM SPSS statistics version 23.0 (SPSS, Inc., an IBM Company, Chicago, IL, USA), and differences with P values of less than 0.05 were considered statistically significant. Cancer characteristics included patient age, sex, Eastern Cooperative Oncology Group (ECOG) performance status (18), cancer histology and stage. Concordance rate between MDT and WFO was expressed as percent agreement and Cohen's kappa value. A logistic regression model was estimated with odds ratio and a 95% confidence interval (CI).
We could collect 463 cases who were presumed to have lung cancer and were admitted to CNUHH for definitive diagnosis from January to December 2018. After excluding 58 patients according to the eligibility criteria, a total of 405 patients were matched up with the inclusion criteria and received treatment decisions from the MDT (Figure 1).
The median age was 71 years, and 83.9% of patients were men (Table 1). Most of the patients showed ECOG performance status 0 to 1, but 11.4% and 1.9% of patients had statuses of 3 and 4, respectively. The histology of 289 cases (71.4%) were non-small cell lung cancer (NSCLC), composed of adenocarcinoma (157 cases; 38.8%) and squamous cell carcinoma (132 cases; 32.6%). Among the rest, 94 cases (23.2%) were small cell lung carcinoma (SCLC).
Concordance between WFO and MDT
Overall treatment concordance between MDT and WFO was 92.4% (kappa value =0.881; P<0.001). The concordance rates according to histology were 94.9% (k=0.900), 90.2% (k=0.857) and 97.9% (k=0.934) for adenocarcinoma, squamous cell carcinoma and small cell lung cancer, respectively.
When concordance was analyzed by both stage and histology, metastatic cases were found to have 100% concordance between MDT and WFO’s decisions (Table 2). High concordances were shown in stage I NSCLC (92.4%, k=0.855), stage IV NSCLC (100%, k=1.000) and extensive disease SCLC (100%, k=1.000). But the concordance rates were 83.3% (k=0.556), 80.8% (k=0.622), and 84.6% (k=0.435) in stage II NSCLC, stage III NSCLC, and limited disease SCLC, respectively.
The results of logistic regression of concordance as a function of ECOG score, histology combined with stage are presented in Table 3. An odds ratio greater than 1 indicates greater odds of concordance, equal to 1 suggests equal odds, and less than one indicates lesser odds. Except metastatic cases in which treatment decisions between WFO and the MDT were perfectly coincident, there was no statistically significant factor that affected concordance between WFO and MDT’s decisions.
Discordances were found in the case of surgery (7/57, 12.3%) and concurrent chemoradiotherapy (CCRT) (15/129, 11.6%). But there was no discordance in patients who were recommended for medical treatment due to metastatic stage. The main reason for not performing surgery in early stage NSCLC was because of underlying disease, old age, poor ECOG score (Table 4). The reason for not performing radiotherapy was patient’s preference for surgery. There were 14 cases of discordance in stage III NSCLC, and various reasons existed among physicians and patients.
This retrospective study demonstrated that an AI-based CDSS trained by experts in the United States could also be used feasibly in Korea. An overall concordance rate of 92.4% was present between MDT and WFO in patients with lung cancer. The strength of agreement was very high in metastatic stage regardless of histology. Treatment recommendations were also highly concordant in stage I NSCLC. However, it was relatively low in stage II–III NSCLC and limited disease SCLC. The concordance rate was only 80.8% in stage III NSCLC. Discordances were most frequently found in case of WFO’s decision for CCRT which was changed to other treatment options by physicians or patients.
Replacing the doctor with an intelligent medical robot is an interesting concept in science fiction. AI in health care is nowadays exist close at hand (7). Machine learning means that the computer learns to perform tasks by analyzing data rather than requiring specific programming instructions from humans, so that they generate their own decision-making algorithms (6). Machine learning has the potential to be extremely useful in medicine, particularly in the interpretation of medical images such as computed tomography and histopathological slides (6,19). It may increase the speed and consistency of diagnosis, but it may also exacerbate overdiagnosis (6). Recently, Xu et al. demonstrated that deep learning can integrate imaging scans at multiple time-points to improve clinical outcome predictions (8). Although AI-based noninvasive radiomics have a great deal of promise, it also has inherent limitations, particularly when it comes to diagnosing early-stage cancer because there is no single right answer to the question (6,20).
In metastatic stage, the treatment option for lung cancer is relatively simple (chemotherapy, targeted therapy or immunotherapy). Most treatments are decided by a physician according to the patients’ performance status. But decision making in earlier stages of lung cancer is more complex because of many patient-related factors associated with co-morbidity, insurance, socioeconomic state or preference. An MDT system and shared decision making is very important in this situation. Despite the overall discordant rate between WFO and MDT (7.6%), all discordances were observed in non-metastatic stages. For example, surgery could not be performed because of several factors like idiopathic pulmonary fibrosis or tumor location which requires pneumonectomy. Inversely, the MDT made decisions to conduct surgery in some cases where the WFO recommended non-surgical treatment. The reason behind adopting surgery was because of individual patient circumstances such as when the patient is young. Because WFO could not reflect patient statuses in detail, some discordance did occur between WFO and MDT in non-metastatic stage, in our study.
Although AI technology continues to evolve, there are additional reasons for the discordance between WFO and MDT across countries. For example, national medical guidelines, ethnic differences in cancer patients, national licensing of recommended drugs or treatments, or compliance with insurance coverage and screening standards are thought to affect the discordance rate among different countries (12).
This study has several limitations. First, this was a small sample-sized retrospective observational study conducted at a single cancer center. Second, the decision between MDT and WFO was not made simultaneously. The MDT made decisions each time the patient had been diagnosed with lung cancer, and the patient’s medical records were input into WFO in the following year. Hence, some changes of treatment guidelines that were not reflected in the study might have occurred. Third, some elderly patients who visited the emergency room but did not want to get biopsies done were not included in this study. Our results could not be applied to patients without tissue confirmation. Finally, we could not analyze detailed treatment options like surgery type, radiation dose or fraction, and regimen of chemotherapy. Because the WFO suggests a lot of chemotherapeutic regimens at once and the regimen changes very quickly, it was difficult to determine whether they were concordant between the WFO and the MDT. Several new drugs that WFO suggested were difficult to use because they were not covered by government insurance.
In conclusion, treatment decisions made by WFO exhibited a high degree of agreement with those of the MDT tumor board, and the concordance varied by stage. AI-based CDSS is expected to play an assistive role, particularly in the metastatic lung cancer stage with less complex treatment options. However, patient-doctor relationships and shared decision making may be more important in non-metastatic lung cancer because of the complexity to reach at an appropriate decision. Further study is warranted to overcome this gray area for current machine learning algorithms.
We are grateful to Ja-yeong Paek for supporting statistical analysis and proofreading the manuscript.
Funding: This study was supported by grants (HCRI19025) from the Chonnam National University Hwasun Hospital Institute for Biomedical Science.
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at http://dx.doi.org/10.21037/tlcr.2020.04.11). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by independent Institutional Review Board (IRB) of Chonnam National University Hwasun Hospital (IRB approval number: CNUHH-2019-195) and individual consent for this retrospective analysis was waived.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
- Siegel RL, Miller KD, Jemal A. Cancer Statistics, 2017. CA Cancer J Clin 2017;67:7-30. [Crossref] [PubMed]
- Oh IJ, Ahn SJ. Multidisciplinary team approach for the management of patients with locally advanced non-small cell lung cancer: searching the evidence to guide the decision. Radiat Oncol J 2017;35:16-24. [Crossref] [PubMed]
- Shin A, Oh CM, Kim BW, et al. Lung Cancer Epidemiology in Korea. Cancer Res Treat 2017;49:616-26. [Crossref] [PubMed]
- Park JY, Jang SH. Epidemiology of Lung Cancer in Korea: Recent Trends. Tuberc Respir Dis (Seoul) 2016;79:58-69. [Crossref] [PubMed]
- Kweon SS. Updates on Cancer Epidemiology in Korea, 2018. Chonnam Med J 2018;54:90-100. [Crossref] [PubMed]
- Adamson AS, Welch HG. Machine Learning and the Cancer-Diagnosis Problem — No Gold Standard. N Engl J Med 2019;381:2285-7. [Crossref] [PubMed]
- The Lancet. Artificial intelligence in health care: within touching distance. Lancet 2018;390:2739. [PubMed]
- Xu Y, Hosny A, Zeleznik R, et al. Deep Learning Predicts Lung Cancer Treatment Response from Serial Medical Imaging. Clin Cancer Res 2019;25:3266-75. [Crossref] [PubMed]
- Hamet P, Tremblay J. Artificial intelligence in medicine. Metabolism 2017;69S:S36-S40. [Crossref] [PubMed]
- Makedon F, Karkaletsis V, Maglogiannis I. Overview: Computational analysis and decision support systems in oncology. Oncol Rep 2006;15:971-4. [PubMed]
- Somashekhar SP, Sepulveda MJ, Puglielli S, et al. Watson for Oncology and breast cancer treatment recommendations: agreement with an expert multidisciplinary tumor board. Ann Oncol 2018;29:418-23. [Crossref] [PubMed]
- Choi YS. Concepts, Characteristics, and Clinical Validation of IBM Watson for Oncology. Hanyang Med Rev 2017;37:49-60. [Crossref]
- Ahmed MN, Toor AS, O'Neil K, et al. Cognitive Computing and the Future of Health Care Cognitive Computing and the Future of Healthcare: The Cognitive Power of IBM Watson Has the Potential to Transform Global Personalized Medicine. IEEE Pulse 2017;8:4-9. [Crossref] [PubMed]
- Chen Y, Elenee Argentinis JD, Weber G. IBM Watson: How Cognitive Computing Can Be Applied to Big Data Challenges in Life Sciences Research. Clin Ther 2016;38:688-701. [Crossref] [PubMed]
- Malin JL. Envisioning Watson as a rapid-learning system for oncology. J Oncol Pract 2013;9:155-7. [Crossref] [PubMed]
- Liu C, Liu X, Wu F, et al. Using Artificial Intelligence (Watson for Oncology) for Treatment Recommendations Amongst Chinese Patients with Lung Cancer: Feasibility Study. J Med Internet Res 2018;20:e11087. [Crossref] [PubMed]
- Zhou N, Zhang CT, Lv HY, et al. Concordance Study Between IBM Watson for Oncology and Clinical Practice for Patients with Cancer in China. Oncologist 2019;24:812-9. [Crossref] [PubMed]
- Oken MM, Creech RH, Tormey DC, et al. Toxicity and response criteria of the Eastern Cooperative Oncology Group. Am J Clin Oncol 1982;5:649-55. [Crossref] [PubMed]
- Ardila D, Kiraly AP, Bharadwaj S, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med 2019;25:954-961. [Crossref] [PubMed]
- Schmidt C. M. D. Anderson Breaks With IBM Watson, Raising Questions About Artificial Intelligence in Oncology. J Natl Cancer Inst 2017. [Crossref] [PubMed]