Dielectric property measurements for the rapid differentiation of thoracic lymph nodes using XGBoost in patients with non-small cell lung cancer: a self-control clinical trial
Original Article

Dielectric property measurements for the rapid differentiation of thoracic lymph nodes using XGBoost in patients with non-small cell lung cancer: a self-control clinical trial

Di Lu1#, Jinxing Peng1#, Zhongju Wang2#, Ying Sun3,4#, Jianxue Zhai1, Zhizhi Wang1, Zhiming Chen1, Yuji Matsumoto5, Long Wang2, Sherman Xuegang Xin6, Kaican Cai1

1Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou, China; 2School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China; 3Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China; 4School of Biomedical Engineering, Southern Medical University, Guangzhou, China; 5Respiratory Endoscopy Division, Department of Endoscopy, National Cancer Center Hospital, Tokyo, Japan; 6Laboratory of Biophysics, School of Medicine, South China University of Technology, Guangzhou, China

Contributions: (I) Conception and design: K Cai, D Lu, Y Sun, L Wang; (II) Administrative support: K Cai, SX Xin, L Wang; (III) Provision of study materials or patients: D Lu, J Peng, Zhongju Wang; (IV) Collection and assembly of data: Y Sun, J Zhai, J Peng; (V) Data analysis and interpretation: J Peng, Y Sun, Zhizhi Wang, Z Chen; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Kaican Cai. Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China. Email: caican@smu.edu.cn; Sherman Xuegang Xin. Laboratory of Biophysics, School of Medicine, South China University of Technology, Guangzhou 510006, China. Email: xinxg@scut.edu.cn; Long Wang. School of Computer and Communication Engineering, University of Science and Technology, Beijing 100083, China. Email: lwang@ustb.edu.cn.

Background: One of the important criteria for thoracic surgeons in making surgical strategies is whether the thoracic lymph nodes (LNs) are metastatic. Frozen section (FS) is widely used as an intraoperative diagnostic method, which is time-consuming and expensive. The dielectric property, including permittivity and conductivity, varies with different tissues. The extreme gradient boosting (XGBoost) is a powerful classifier and widely used. Thus, this study aims to develop the rapid differentiation method combining dielectric property and XGBoost, and assess its efficacy on the thoracic LNs in patients with non-small cell lung cancer (NSCLC).

Methods: This was a single center self-control clinical trial with paraffin pathology section (PPS) results as gold diagnosis. The LNs from the pathologically diagnosed patients with NSCLC were recruited, which were measured by open-ended coaxial probe for the dielectric property within 1–4,000 MHz after removal from the patients and then were sent to perform FS and PPS diagnosis. The XGBoost combining with dielectric property was developed to differentiate malignant LNs from benign LNs. The classified efficacy was determined using the receiver operator characteristic (ROC) curve and area under the curve (AUC).

Results: A total of 204 LNs from 67 NSCLC patients were analyzed. The mean values of the two parameters differed significantly (P<0.001) between benign and malignant LNs. The AUC for permittivity and conductivity were 0.850 [95% confidence interval (CI): 0.786 to 0.915; P<0.001] and 0.887 (95% CI: 0.828 to 0.946; P<0.001), respectively. The AUC was 0.893 (95% CI: 0.834 to 0.951; P<0.001) when the two parameters were combined. After the application of the XGBoost, the AUC was 0.968 (95% CI: 0.918 to 1.000; P<0.001), and the accuracy was 87.80%. Its sensitivity was 58.33% and the specificity was 100%. When the Synthetic Minority Oversampling Technique (SMOTE) algorithm was used, the AUC was 0.954 (95% CI: 0.883 to 1.000; P<0.001) and the accuracy was 92.68%. Its sensitivity was 83.33% and the specificity was 96.55%.

Conclusions: This method might be useful for thoracic surgeons during surgery, for its relatively high efficacy in rapid differentiation of LNs for patients with NSCLC.

Keywords: Non-small cell lung cancer (NSCLC); dielectric properties; lymph node (LN); extreme gradient boosting (XGBoost); diagnosis


Submitted Nov 26, 2021. Accepted for publication Mar 14, 2022.

doi: 10.21037/tlcr-22-92


Introduction

According to the GLOBOCAN 2020 estimate of cancer incidence and mortality, lung cancer is the second most common malignancy and the leading cause of cancer-related deaths (1). In China, lung cancer ranks first in incidence and mortality among all cancers (2). Non-small cell lung cancer (NSCLC) accounts for approximately 80% of all lung cancer cases (3). At present, under the guidance of the National Comprehensive Cancer Network (NCCN), surgical treatment offers the best curative outcome in patients with early-stage NSCLC.

Liu et al. (4) first reported that intraoperative frozen section (FS) diagnosis is a reliable way to guide the resection of peripheral lung adenocarcinoma. Consequently, intraoperative rapid FS diagnosis of pulmonary hilar and segmental lymph nodes (LNs) before pulmonary resection has become an important and routine procedure for patients with NSCLC. However, intraoperative rapid FS examination is time-consuming and requires substantial manpower. Furthermore, the success of this technique depends on the experience and capabilities of the pathologist involved. Therefore, finding an alternative method for the accurate and rapid intraoperative diagnosis of LNs is crucial. One potential approach for the early detection of metastasis is assessing the dielectric properties of the LNs (5). One of the advantages of dielectric property is that it is easy to be measured and costs less time and money.

Dielectric properties involve two parameters, namely, permittivity (σ) and conductivity (ε). Previous research in human tissues have shown that dielectric properties may be used for the noninvasive early detection of tumors (6-8). Choi et al. (7) measured the dielectric properties of breast cancer tissue from 0.5 to 30 GHz and demonstrated that both malignant LNs and breast cancer tissue clearly differ from benign tissues. Our previous retrospective study showed that the dielectric property measurements of malignant LNs were higher than those of benign LNs in the frequency range of 1–4,000 MHz (9,10).

Although the dielectric properties of a variety of physical structures and tissues, including malignant and benign tissues, have been reported by several studies (5-8,10-13), there is still a paucity of data regarding the specific efficacy of these parameters in the classification of pulmonary LN metastasis. Therefore, to investigate the efficacy of using permittivity and conductivity to predict pulmonary LN metastasis, the dielectric properties of both benign and malignant pulmonary LNs were examined over a frequency range of 1–4,000 MHz (9,10).

Dielectric properties are advantageous due to the efficient processing of large amounts of throughput data. The extreme gradient boosting (XGBoost) software has been increasingly applied in clinical practice and has achieved favorable results (14-17). In contrast to traditional learning classifiers, the XGBoost tree boosting model can combine hundreds of less accurate tree models into one strong classifier. However, to date, there have been no studies assessing the use of XGBoost to classify LNs in NSCLC patients. Accordingly, we introduced XGBoost into our study.

The present study proposed the use of dielectric property measurements and XGBoost to rapidly discriminate between malignant and benign LNs in NSCLC patients during surgery. We present the following article in accordance with the STARD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-22-92/rc).


Methods

Patient eligibility

All human studies were approved by the ethics committee of the Nanfang Hospital, Southern Medical University, Guangzhou, China (No. NFEC-2017-070). This trial was also registered on ClinicalTrials.gov (No. NCT03339479). All patients provided written informed consent. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). Recruitment criteria were as follows. Patients who were diagnosed with NSCLC and scheduled for LN resection at the Nanfang Hospital of Southern Medical University were included in this study. Patients who had had neoadjuvant therapy before surgery were excluded. To ensure the consistency of measurements, only LNs with a depth greater than 1.5 mm (perpendicular) and a diameter of more than 5 mm were selected for dielectric property measurements. The measuring area of the LNs was guided by an experienced pathologist. The following patient data were collated: name, age, FS diagnosis, and paraffin pathology section (PPS) diagnosis, including Ki-67 immunohistochemical markers if examined.

Measurements

The following standard method of measurements were applied. All measurements were performed on LNs using open-ended coaxial probes (10). LNs were measured within 10 minutes after excision, thereby increasing time-sensitivity. Each sample was measured for permittivity and conductivity from 1 to 4,000 MHz. The measurements lasted for 2–5 minutes and the contact measurement did not affect or destroy the samples. The LNs were then diagnosed with FS to determine whether further excisions were necessary (18,19).

The instruments consisted of a vector network analyzer (VNA) (model AV3680A, China Electronics Technology Instruments Co., Ltd., China), an open-ended coaxial probe (UT-086-50, Mintrue Co., Ltd., China), and a personal computer (Figure 1A). The probe was connected to the VNA through a Bayonet Neill-Concelman connector. A computer (model P79G, Dell Inc., TX, USA) was used to connect the various instruments and to collect and format incoming data (10).

Figure 1 A schematic picture of the measurement system. (A) The instruments and reagents used to measure the dielectric properties of LNs. (B) Measuring LNs with an open-ended coaxial probe. (C) The graphical abstract of the measurement method. VNA, vector network analyzer; XGBoost, extreme gradient boosting; LNs, lymph nodes.

To ensure the reliability of our measurements, the probe was calibrated with calibrating materials before each measurement (Figure 1B). The Figure 1C shows the process flow chart. The results measured by the probe were then compared with those reported in the literature (20,21) to evaluate the accuracy of the coaxial probe. The temperature of the liquids was also measured and recorded. Thus, the probe was determined to be accurate for the duration of this study. The surface of the sample and the tip of the probe were cleaned with tissue paper and disinfectant to avoid errors caused by blood, bubbles, or other contaminants (22-24). The probe was then placed on the surface of the sample in a perpendicular fashion to measure the dielectric properties, ensuring that there was no air-gap between the surface and the probe. The complex reflection coefficient was recorded over the 1–4,000 MHz frequency range and was subsequently converted into complex permittivity through the detailed procedure described by Bobowski and Johnson (25). Values were measured five times per position to reduce error. Eventually, we used the mean value of five measurements. In addition, the surface temperature of the specimen was recorded with a digital thermometer (model TM-902C, Apuhua Co., Ltd., Shenzhen, China) for data adjustment.

Statistical methodology

The original data files were imported into MATLAB version R2014a. All required permittivity and conductivity values were obtained directly from 1 to 4,000 MHz, with intervals of 1 MHz (a total of 4,000 frequencies). Subsequently, the entire dataset was imported into SPSS version 22.0 for analysis. Missing data were handled by exclusion. Since the data of permittivity and conductivity didn’t conform to the test of normality, differences in the dielectric properties between benign and malignant LNs were analyzed with Mann-Whitney tests. It is statistically significant that the P values <0.05. Before using XGBoost, the permittivity and conductivity were combined as the predictive factors of thoracic LNs by using binary logistic regression equation. Then receiver operator characteristic (ROC) curves were established to evaluate the potential of these properties to serve as diagnostic criteria. The ROC curve takes the false positive rate (1 − specificity) as the horizontal coordinate and the true positive rate (sensitivity) as the ordinate. The accuracy of the diagnostic measurement was evaluated by calculating the size of the area under the curve (AUC).

XGBoost model and the Synthetic Minority Oversampling Technique (SMOTE) algorithm

The original data were randomly divided into two groups (a training set and a test set) in which the ratio between malignant and benign LNs differed. The model was optimized by analyzing the training set with grid searches. To achieve the best evaluation, a 5-fold cross-validation strategy was used. The learning rate was set to 0.1, and the ratio of subsamples that created the tree characteristics was set to 0.8. The maximum depth was set to 7. The training set was divided into five subsets, one of which was retained. The remaining subsets were subsequently used for training. Finally, by adjusting the parameters, the best model was selected. The test set was imported into the final model to detect the classification ability of the dielectric properties on LNs.

As the number of benign LN samples was larger than that of the malignant LN samples (approximately 4 times), the performance of XGBoost degraded over time. To reduce this effect, the SMOTE algorithm was used to expand some malignant LN samples to achieve a balanced ratio. The SMOTE algorithm features several key steps. First, the algorithm considers the Euclidean distance from each sample x in the minority class to the rest of the samples in the same class. Then, the parameter of k nearest neighbors of this sample is chosen. By randomly selecting a neighbor y from these neighbors, a new sample z is created with the following equation: Z = x + a × (y − x), in which a represents a random number between 0 and 1.


Results

Characteristics of the LNs

Patients were recruited over a 2-year period between August 2017 and April 2019. A total of 207 samples were collected from 68 patients with NSCLC who satisfied the recruitment criteria. The LN measurements did not cause any adverse events. One male patient with squamous cell carcinoma was excluded as he received neoadjuvant chemotherapy before surgery, and substantial differences may have existed. The remaining patients had either benign or malignant samples. One sample was excluded due to missing data. In total, 204 samples were included from 67 patients. Figure 2 shows the flowchart of the sample screening process.

Figure 2 A flow diagram showing the screening of eligible LNs. XGBoost, extreme gradient boosting; LNs, lymph nodes.

The PPS diagnosis comprised of routine examinations and detection of Ki-67 biological marker. Each patient and each LN was considered an independent individual and several key differences were identified (Table 1). First, the youngest of our patients was 30 years of age, and the oldest was 82 years of age. The proportion of adenocarcinomas and squamous carcinomas accounted for between 44% and 52% of cases. More than 70% of the samples were from men. Second, stage II and III NSCLC patients together accounted for approximately 40% of all cases. Third, the Ki-67 biological marker was tested in 47.8% of patients, of whom 62.5% had an expression greater than 30%. In addition, the time spent analyzing dielectric properties was approximately 2–5 minutes which was much shorter than intraoperative rapid FS examinations (approximately 45 minutes).

Table 1

Characteristics of the patients and the LNs

Characteristics Patients LNs
Total number 67 204
Age (years), median [range] 60 [30–82] 59 [30–82]
Gender, n (%)
   Male 51 (76.1) 158 (77.5)
   Female 16 (23.9) 46 (22.5)
Time spent on measuring dielectric properties (minutes), mean ± SD 3.81±0.31 3.81±0.31
Time spent on rapid FS examination (minutes), mean ± SD 45.82±5.25 44.48±5.43
PPS diagnosis, n (%)
   Squamous 30 (44.8) 92 (45.1)
   Adenocarcinoma 34 (50.7) 105 (51.5)
   ASC 2 (3.0) 4 (2.0)
   MC 1 (1.5) 3 (1.5)
Pathological stage, n (%)
   Stage 0 2 (3.0) 4 (2.0)
   Stage I 25 (37.3) 82 (40.2)
   Stage II 13 (19.4) 41 (20.1)
   Stage III 26 (38.8) 75 (36.8)
   Stage IV 1 (1.5) 2 (1.0)
Ki-67, n (%) 32 98
   ≤30% 12 (37.5) 32 (32.7)
   >30% 20 (62.5) 66 (67.3)

LNs, lymph nodes; SD, standard deviation; FS, frozen section; PPS, paraffin pathology section; ASC, adenosquamous carcinoma; MC, mucoepidermoid carcinoma.

Differences in the dielectric properties between benign and malignant LNs

According to the results of the PPS diagnosis, the LNs were classified into two groups, namely, benign LNs (n=164, 80.4%) and LNs with metastatic carcinoma (n=40, 19.6%).

The average permittivity and conductivity of malignant and benign LNs was compared at all 4,000 frequencies using the Mann-Whitney tests. The results indicated significant differences between benign and malignant LNs for both permittivity and conductivity (P<0.001; Table 2). Table 3 shows the characteristics of the malignant LNs and benign LNs.

Table 2

Results from Mann-Whitney tests

Dielectric property N Mean ± SD P value
Permittivity 204 <0.001
   Benign 164 43.43±7.74
   Malignant 40 53.14±5.80
Conductivity 204 <0.001
   Benign 164 1.39±0.24
   Malignant 40 1.74±0.18

SD, standard deviation.

Table 3

Characteristics of the malignant and benign LNs

Characteristics Malignant LNs Benign LNs
Total number 40 164
Age (years), median [range] 59 [40–78] 60 [30–82]
Gender, n (%)
   Male 34 (85.0) 124 (75.6)
   Female 6 (15.0) 40 (24.4)
Time spent on measuring dielectric properties (minutes), mean ± SD 3.86±0.28 3.80±0.32
Time spent on rapid FS examination (minutes), mean ± SD 45.36±5.29 44.68±5.42
PPS diagnosis, n (%)
   Squamous 21 (52.5) 71 (43.3)
   Adenocarcinoma 17 (42.5) 88 (53.7)
   ASC 2 (5.0) 4 (2.4)
   MC 1 (0.6)
Pathological stage, n (%)
   Stage 0 4 (2.4)
   Stage I 82 (50.0)
   Stage II 5 (12.5) 36 (22.0)
   Stage III 35 (87.5) 40 (24.4)
   Stage IV 2 (1.2)
Ki-67, n (%) 29 69
   ≤30% 8 (27.6) 24 (34.8)
   >30% 21 (72.4) 45 (65.2)

LNs, lymph nodes; SD, standard deviation; FS, frozen section; PPS, paraffin pathology section; ASC, adenosquamous carcinoma; MC, mucoepidermoid carcinoma.

Figure 3 showed the outcomes from the mean values of the dielectric property. The average permittivity and conductivity were calculated for the two groups at each frequency (Figure 3A,3B). Visualizing the different tendencies between sets of permittivity data was difficult when using line charts. Instead, a scatter plot was constructed (Figure 3A). This showed that the number of spaces between the same ordinates was not equal, because the numerical values for permittivity were so large that the original figure needed to be modified to save space. Hence, we multiplied the actual values between the same intervals. For example, the actual interval between 50 and 100 was the same as that between 100 and 250, but the numerical value of the latter was three times that of the former. Consequently, Figure 3A emphasizes two findings. First, the average permittivity for both benign and malignant LNs decreased gradually with increasing frequency. Second, the average permittivity for the malignant group was larger than that for the benign group at most frequencies.

Figure 3 Outcomes from the mean dielectric property values. The average permittivity (A) and conductivity (B) of all specimens from 1–4,000 MHz frequencies. The average permittivity (C) and conductivity (D) of all specimens from 50–900 MHz. The AUC values for permittivity (E), conductivity (F), and both parameters combined (G). AUC, area under the curve.

Figure 3B also suggests two findings. First, the general trend showed a considerable increase in conductivity with increasing frequency. Second, the average conductivity for the malignant group was also larger than that for the benign group at most frequencies.

To make these differences easier to visualize, we selected the average of both groups from 50 to 900 MHz and constructed a new line chart (Figure 3C,3D). From these figures, we were able to support the above conclusions in a more robust manner.

The diagnostic efficiency of dielectric properties for the differential diagnosis of patients with NSCLC

Previous results have demonstrated significant differences in the mean values of permittivity and conductivity between malignant and benign LNs.

The ROC curves showed that the AUC for conductivity [0.887, 95% confidence interval (CI): 0.828 to 0.946; P<0.001; Figure 3F] was greater than that for permittivity (0.850; 95% CI: 0.786 to 0.915; P<0.001; Figure 3E). Moreover, although the results for both permittivity and conductivity were good, the combined application of permittivity and conductivity showed superior performance, with an AUC of 0.893 (95% CI: 0.834 to 0.951; P<0.001; Figure 3G), suggesting that permittivity and conductivity should be used together as a diagnostic factor.

The diagnostic efficacy of XGBoost and the SMOTE algorithm for distinguishing malignant from benign LNs

Although relatively high efficacy and sensitivity were achieved using the mean values of both permittivity and conductivity as a classified model, the specificity was not ideal for daily practice. In addition, permittivity and conductivity values at different frequencies may represent different dimensions of information. Considering only average values as the classified criteria appear to be somewhat wasteful. Consequently, XGBoost was used to identify a more accurate solution for differential diagnosis.

Each LN was considered as an individual, and the total samples were randomly divided into two groups, a training set (163 samples) and a test set (41 samples). There were no statistical differences in any of the characteristic examined between these two groups (P>0.05; Tables 4,5). The XGBoost model was trained using the training set that comprised of 163 LNs, with each LN featuring 4,000 items of data relating to permittivity and conductivity at all frequencies. Figure 4 showed the outcomes with the application of XGBoost. The test set was then imported into the trained model. XGBoost achieved an accuracy of 87.80%, and the AUC was 0.968 (95% CI: 0.918 to 1.000; P<0.001; Figure 4A,4C), which were highly satisfactory. However, the training set contained only 28 malignant LNs and 135 benign LNs. Thus, the SMOTE algorithm was used to adjust the unbalanced samples to achieve more accurate results. This provided a new training set for the final model which achieved an accuracy of 92.68% and the AUC of 0.954 (95% CI: 0.883 to 1.000; P<0.001; Figure 4B,4D).

Table 4

Characteristics of the training set and the test set in which each patient was considered as an individual

Characteristics Training set Test set P value
Total number of patients 45 22
Age (years), median [range] 62 [30–82] 62 [40–78] 0.440
Gender, n (%) 0.649
   Male 35 (77.8) 16 (72.7)
   Female 10 (22.2) 6 (27.3)
Time spent on measuring dielectric properties (minutes), mean ± SD 3.78±0.33 3.83±0.28 0.840
Time spent on rapid FS examination (minutes), mean ± SD 44.79±4.65 44.89±4.98 0.680
PPS diagnosis, n (%) 0.114
   Squamous 24 (53.3) 6 (27.3)
   Adenocarcinoma 19 (42.2) 15 (68.2)
   ASC 1 (2.2) 1 (4.5)
   MC 1 (2.2)
Pathological stage, n (%) 0.349
   Stage 0 2 (4.4)
   Stage I 18 (40.0) 7 (31.8)
   Stage II 10 (22.2) 3 (13.6)
   Stage III 15 (33.3) 11 (50.0)
   Stage IV 1 (4.5)
Ki-67, n (%) 21 11 0.654
   ≤30% 8 (38.1) 4 (36.4)
   >30% 13 (61.9) 7 (63.6)

SD, standard deviation; FS, frozen section; PPS, paraffin pathology section; ASC, adenosquamous carcinoma; MC, mucoepidermoid carcinoma.

Table 5

Characteristics of the training set and the test set when each LN was considered as an individual

Characteristics Training set Test set P value
Total number of LNs 163 41
Age (years), median [range] 59 [30–82] 60 [40–78] 0.489
Gender, n (%) 0.463
   Male 128 (78.5) 30 (73.2)
   Female 35 (21.5) 11 (26.8)
Time spent on measuring dielectric properties (minutes), mean ± SD 3.81±0.32 3.83±0.27 0.774
Time spent on rapid FS examination (minutes), mean ± SD 44.68±5.84 44.26±4.95 0.651
PPS diagnosis, n (%) 0.083
   Squamous 80 (49.1) 12 (29.3)
   Adenocarcinoma 78 (47.9) 27 (65.9)
   ASC 3 (1.8) 1 (2.4)
   MC 2 (1.2) 1 (2.4)
Pathological stage, n (%) 0.429
   Stage 0 3 (1.8) 1 (2.4)
   Stage I 67 (41.1) 15 (36.6)
   Stage II 35 (21.5) 6 (14.6)
   Stage III 57 (35.0) 18 (43.9)
   Stage IV 1 (0.6) 1 (2.4)
Ki-67, n (%) 79 19 0.768
   ≤30% 26 (32.9) 6 (31.6)
   >30% 53 (67.1) 14 (68.4)

LNs, lymph nodes; SD, standard deviation; FS, frozen section; PPS, paraffin pathology section; ASC, adenosquamous carcinoma; MC, mucoepidermoid carcinoma.

Figure 4 The outcomes achieved after the application of XGBoost. (A) The AUC value of the test set processed by XGBoost. (B) The AUC value of the test set processed by XGBoost after using the SMOTE algorithm to balance the training set. (C) The outcome acquired with XGBoost. (D) The outcome acquired with XGBoost after using the SMOTE algorithm to balance the training set. XGBoost, extreme gradient boosting; AUC, area under the curve; SMOTE, Synthetic Minority Oversampling Technique.

Owing to the differences between the line charts (Figure 3A,3B), the 4,000 frequencies were divided into four groups, 1–1,000, 1,001–2,000, 2,001–3,000, and 3,001–4,000 MHz. Each group had 1,000 data points associated with permittivity and conductivity. The data were then processed in the same way as before. Figure 5 shows that the accuracy ranged from 78.05–85.37%, and the AUCs ranged from 0.886 to 0.941, with these values being lower than those of the raw data. After the SMOTE algorithm was applied, the accuracy ranged from 82.93% to 85.37%, and the AUCs ranged from 0.864 to 0.936. Again, these values were lower than before.

Figure 5 Outcomes relating to dielectric properties at different frequencies. The accuracy and AUC of the test set processed with XGBoost, and the AUC value of the test set processed with XGBoost after using the SMOTE algorithm to balance the training set, using dielectric property data from the frequency of 1–1,000 MHz (A-C), 1,001–2,000 MHz (D-F), 2,001–3,000 MHz (G-I), and 3,001–4,000 MHz (J-L). XGBoost, extreme gradient boosting; SMOTE, Synthetic Minority Oversampling Technique; AUC, the area under the curve.

Discussion

This study demonstrated that dielectric properties of LNs and XGBoost together represent a novel and effective method to discriminate between benign and malignant LNs. Both the permittivity and conductivity, either individually or in combination, could be used to discriminate LNs. Most importantly, this method was time effective and showed relatively higher accuracy than rapid FS examinations.

This investigation showed that dielectric properties have substantial value in the classification of LNs. This method has several advantages. First, in a previous study, Joines et al. (26) reported that larger values of the dielectric properties were associated with malignant pulmonary LNs, in agreement with our present results. Meanwhile, the number of samples and the range of dielectric properties in our study were larger than those in previous studies (5,21,26,27), thus making the present outcomes more convincing and accurate. At the same time, some abnormal data were observed and documented in our data, as shown in Figure 3A. Although these abnormal data did not influence the overall outcomes, they may have arisen from random error of measurement. Thus, more standardized testing guidelines and training should be developed. Second, the patients enrolled in our study had been diagnosed with NSCLC. Therefore, our data accurately reflected the differences between benign and malignant LNs in patients with NSCLC and helped to build a superior model for patients with NSCLC. However, since only four histological types of NSCLC were included in this study, to increase the discriminatory efficacy of other thoracic carcinoma, related studies should be performed on other thoracic malignancies. Furthermore, we considered that a specific area of frequency might exist at which the difference between malignant LNs and benign LNs is the most significant and better outcomes may be reached by using this frequency. Future research should further investigate the specific frequency range.

Several studies have reported that XGBoost is superior to other machine learning models in clinical practice (28-30). Studies using XGBoost as a predictive model are becoming increasingly common and are achieving good results (14-17). Compared with other machine learning models (31,32), models with regularization terms and column sampling show improved robustness. Furthermore, when each tree selects a split point, this technique adopts a parallelization strategy that significantly improves the speed of the model. Moreover, this strategy has low equipment requirements. In the present study, each sample had 8,000 characteristics. This value was far higher than those in other studies, and training the model remained challenging. However, this large dataset also facilitated a more accurate prediction of the outcomes.

In terms of diagnostic efficacy, an AUC less than 0.85 indicates that a prediction model has poor predictive value. When the value of the AUC is between 0.85 and 0.95, the predictive model is considered satisfactory. The AUC of our model using dielectric properties fell within this range, suggesting a strong correlation between dielectric properties and LNs associated with NSCLC. However, the mean value did not fully represent the dielectric properties of each LN. This issue was resolved by incorporating XGBoost to establish a classification model. Finally, both the AUC values achieved from the raw data and after application of the SMOTE algorithm were superior to that obtained from the mean values of the dielectric properties. Furthermore, when only XGBoost was used to process the data, the sensitivity was 58.33%, and the specificity was 100%. After the SMOTE algorithm was used to balance the data of the training set, the accuracy (92.68%) was better than that derived from the raw data (87.80%) processed by XGBoost. In addition, the sensitivity was 83.33%, and the specificity was 96.55%. As shown in Figure 5, increasing the amount of dielectric data can achieve better accuracy and improve the AUC values. After using the SMOTE algorithm, the number of false positives and false negatives was low, thus decreasing the risk of making an incorrect surgical choice.

Although dielectric properties were used to investigate malignant and benign tissues in 1994, very few studies have considered dielectric properties for LNs. Previous studies did not acquire sufficient amounts of data and did not use appropriate tools for classification. The acquisition of sufficient amounts of data and the use of an appropriate classification system can enable the detection of differences between LNs. The addition of XGBoost significantly improved our model for the discrimination of pulmonary LNs. XGBoost has an ability to learn, and the greater the data input, the more accurate the model becomes.

A previous study has shown that there is no significant difference between intraoperative and postoperative complications of segmentectomy and lobectomy patients (33). With the advantage of better pulmonary function preservation than lobectomy (34,35), segmentectomy has become a popular option for surgery. However, before performing this type of surgery, physicians must ensure that there is no metastasis among certain types of LNs, such as mediastinal LNs, hilar LNs, and adjacent lobar-segmental LNs (36,37). Examining such tissues with FS diagnosis may require half an hour or more, while analyzing dielectric properties and using XGBoost software may only require a few minutes during the surgery.

There were some limitations in this study. First, our data were collected from a single clinical center, potentially limiting the wide applicability of the outcomes. Future studies should aim to collect large datasets from multiple institutions. Second, there is no standard criterion for dielectric properties measurement and different pathologists may have different opinions on the measurement area. Thus, there is difficulty in promoting this kind of novel technique in other medical centers and it is necessary to set a standard criterion of measurement. Third, the number of samples in the test set was relatively small. Moreover, the number of benign samples was larger than that of the malignant samples. Although we applied the SMOTE algorithm, future studies should balance the numbers of malignant and benign samples while collecting data. Finally, although the current results confirmed the feasibility of the XGBoost model, applying XGBoost in the clinic remains difficult, primarily because of the lack of specific clinical application scenarios, standard databases, standardization in industry norms or expert consensus, and a lack of legal consideration.


Acknowledgments

The authors appreciate the academic support from the AME Thoracic Surgery Collaborative Group.

Funding: This work was supported by the Nanfang Thoracic Surgery Collaborative Project (No. NFTS-T-0201); the Science and Technology Planning Project of Guangzhou Province of China (No. 2018B090906001); the Dean Research Funding of Nanfang Hospital, Southern Medical University, China (No. 2020B011); the Medical Scientific Research Foundation of Guangdong Province, China (No. C2021049); and the National Natural Science Foundation of China (Grant Nos. 61929101, 61671229).


Footnote

Reporting Checklist: The authors have completed the STARD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-22-92/rc

Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-22-92/dss

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-22-92/coif). YM receives grants from National Cancer Center Research and Development Fund, Grant-in-Aid for Scientific Research on Innovative Areas and Hitachi, Ltd.; honoraria for lectures from Olympus, AstraZeneca, Novartis, COOK, AMCO Inc., Thermo Fisher Scientific, Erbe Elektromedizin GmbH, Fujifilm, Chugai and Eli Lilly. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). This study was approved by the ethics committee of the Nanfang Hospital, Southern Medical University, Guangzhou, China (No. NFEC-2017-070). All patients provided written informed consent.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
  2. Chen W, Zheng R, Baade PD, et al. Cancer statistics in China, 2015. CA Cancer J Clin 2016;66:115-32. [Crossref] [PubMed]
  3. Darling GE, Maziak DE, Inculet RI, et al. Positron emission tomography-computed tomography compared with invasive mediastinal staging in non-small cell lung cancer: results of mediastinal staging in the early lung positron emission tomography trial. J Thorac Oncol 2011;6:1367-72. [Crossref] [PubMed]
  4. Liu S, Wang R, Zhang Y, et al. Precise Diagnosis of Intraoperative Frozen Section Is an Effective Method to Guide Resection Strategy for Peripheral Small-Sized Lung Adenocarcinoma. J Clin Oncol 2016;34:307-13. [Crossref] [PubMed]
  5. Cameron TR, Okoniewski M, Fear EC, et al. A preliminary study of the electrical properties of healthy and diseased lymph nodes. In: 2010 14th International Symposium on Antenna Technology and Applied Electromagnetics & the American Electromagnetics Conference. IEEE, 2010:1-3.
  6. Wang Y, Shao Q, Van de Moortele PF, et al. Mapping electrical properties heterogeneity of tumor using boundary informed electrical properties tomography (BIEPT) at 7T. Magn Reson Med 2019;81:393-409. [Crossref] [PubMed]
  7. Choi JW, Cho J, Lee Y, et al. Microwave detection of metastasized breast cancer cells in the lymph node; potential application for sentinel lymphadenectomy. Breast Cancer Res Treat 2004;86:107-15. [Crossref] [PubMed]
  8. Mehta P, Chand K, Narayanswamy D, et al. Microwave reflectometry as a novel diagnostic tool for detection of skin cancers. IEEE Trans Instrum Meas 2006;55:1309-16. [Crossref]
  9. Lu D, Yu H, Wang Z, et al. Classification of Metastatic and Non-Metastatic Thoracic Lymph Nodes in Lung Cancer Patients Based on Dielectric Properties Using Adaptive Probabilistic Neural Networks. Front Oncol 2021;11:640804. [Crossref] [PubMed]
  10. Yu X, Sun Y, Cai K, et al. Dielectric Properties of Normal and Metastatic Lymph Nodes Ex Vivo From Lung Cancer Surgeries. Bioelectromagnetics 2020;41:148-55. [Crossref] [PubMed]
  11. Surowiec AJ, Stuchly SS, Barr JB, et al. Dielectric properties of breast carcinoma and the surrounding tissues. IEEE Trans Biomed Eng 1988;35:257-63. [Crossref] [PubMed]
  12. Li Z, Deng G, Li Z, et al. A large-scale measurement of dielectric properties of normal and malignant colorectal tissues obtained from cancer surgeries at Larmor frequencies. Med Phys 2016;43:5991. [Crossref] [PubMed]
  13. Guardiola M, Buitrago S, Fernández-Esparrach G, et al. Dielectric properties of colon polyps, cancer, and normal mucosa: Ex vivo measurements from 0.5 to 20 GHz. Med Phys 2018; Epub ahead of print. [Crossref] [PubMed]
  14. Yu B, Qiu W, Chen C, et al. SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 2020;36:1074-81. [Crossref] [PubMed]
  15. Mao B, Zhang L, Ning P, et al. Preoperative prediction for pathological grade of hepatocellular carcinoma via machine learning-based radiomics. Eur Radiol 2020;30:6924-32. [Crossref] [PubMed]
  16. Chen T, Li X, Li Y, et al. Prediction and Risk Stratification of Kidney Outcomes in IgA Nephropathy. Am J Kidney Dis 2019;74:300-9. [Crossref] [PubMed]
  17. Wang K, Zuo P, Liu Y, et al. Clinical and Laboratory Predictors of In-hospital Mortality in Patients With Coronavirus Disease-2019: A Cohort Study in Wuhan, China. Clin Infect Dis 2020;71:2079-88. [Crossref] [PubMed]
  18. Nashef SA, Kakadellis JG, Hasleton PS, et al. Histological examination of peroperative frozen sections in suspected lung cancer. Thorax 1993;48:388-9. [Crossref] [PubMed]
  19. Howington JA, Blum MG, Chang AC, et al. Treatment of stage I and II non-small cell lung cancer: Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 2013;143:e278S-313S.
  20. Stogryn A. Equations for calculating the dielectric constant of saline water (correspondence). IEEE Trans Microw Theory Tech 1971;19:733-6. [Crossref]
  21. Popovic D, McCartney L, Beasley C, et al. Precision open-ended coaxial probes for in vivo and ex vivo dielectric spectroscopy of biological tissues at microwave frequencies. IEEE Trans Microw Theory Tech 2005;53:1713-22. [Crossref]
  22. Schepps JL, Foster KR. The UHF and microwave dielectric properties of normal and tumour tissues: variation in dielectric properties with tissue water content. Phys Med Biol 1980;25:1149-59. [Crossref] [PubMed]
  23. Cheng X, Zheng D, Li Y, et al. Tumor histology predicts mediastinal nodal status and may be used to guide limited lymphadenectomy in patients with clinical stage I non-small cell lung cancer. J Thorac Cardiovasc Surg 2018;155:2648-56.e2. [Crossref] [PubMed]
  24. Jilnai MT, Wen WP, Cheong LY, et al. A Microwave Ring-Resonator Sensor for Non-Invasive Assessment of Meat Aging. Sensors (Basel) 2016;16:52. [Crossref] [PubMed]
  25. Bobowski JS, Johnson T. Permittivity measurements of biological samples by an open-ended coaxial line. Prog Electromagn Res B 2012;40:159-83. [Crossref]
  26. Joines WT, Zhang Y, Li C, et al. The measured electrical properties of normal and malignant human tissues from 50 to 900 MHz. Med Phys 1994;21:547-50. [Crossref] [PubMed]
  27. Li Z, Wang W, Cai Z, et al. Variation in the dielectric properties of freshly excised colorectal cancerous tissues at different tumor stages. Bioelectromagnetics 2017;38:522-32. [Crossref] [PubMed]
  28. Chen X, Wang ZX, Pan XM. HIV-1 tropism prediction by the XGboost and HMM methods. Sci Rep 2019;9:9997. [Crossref] [PubMed]
  29. Polano M, Chierici M, Dal Bo M, et al. A Pan-Cancer Approach to Predict Responsiveness to Immune Checkpoint Inhibitors by Machine Learning. Cancers (Basel) 2019;11:1562. [Crossref] [PubMed]
  30. Ruan Y, Bellot A, Moysova Z, et al. Predicting the Risk of Inpatient Hypoglycemia With Machine Learning Using Electronic Health Records. Diabetes Care 2020;43:1504-11. [Crossref] [PubMed]
  31. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: Association for Computing Machinery, 2016:785-94.
  32. Nielsen D. Tree boosting with xgboost-why does xgboost win" every" machine learning competition? Trondheim: Norwegian University of Science and Technology, 2016.
  33. Suzuki K, Saji H, Aokage K, et al. Comparison of pulmonary segmentectomy and lobectomy: Safety results of a randomized trial. J Thorac Cardiovasc Surg 2019;158:895-907. [Crossref] [PubMed]
  34. Charloux A, Quoix E. Lung segmentectomy: does it offer a real functional benefit over lobectomy? Eur Respir Rev 2017;26:170079. [Crossref] [PubMed]
  35. Nomori H, Cong Y, Sugimura H. Systemic and regional pulmonary function after segmentectomy. J Thorac Cardiovasc Surg 2016;152:747-53. [Crossref] [PubMed]
  36. Nomori H. Segmentectomy for c-T1N0M0 non-small cell lung cancer. Surg Today 2014;44:812-9. [Crossref] [PubMed]
  37. Zhu E, Xie H, Dai C, et al. Intraoperatively measured tumor size and frozen section results should be considered jointly to predict the final pathology for lung adenocarcinoma. Mod Pathol 2018;31:1391-9. [Crossref] [PubMed]
Cite this article as: Lu D, Peng J, Wang Z, Sun Y, Zhai J, Wang Z, Chen Z, Matsumoto Y, Wang L, Xin SX, Cai K. Dielectric property measurements for the rapid differentiation of thoracic lymph nodes using XGBoost in patients with non-small cell lung cancer: a self-control clinical trial. Transl Lung Cancer Res 2022;11(3):342-356. doi: 10.21037/tlcr-22-92

Download Citation