Mass Spectrometry to Classify NonSmall-Cell Lung Cancer Patients for Clinical Outcome After Treatment With Epidermal Growth Factor Receptor Tyrosine Kinase Inhibitors: A Multicohort Cross-Institutional Study
J. Natl. Cancer Inst. Taguchi et al.
99: 838
Supplementary Data
Files in this Data Supplement:
- Supplementary Table 1 -
Mass spectral features used in the classification algorithm
- Supplementary Fig. 1 -
Spectral data from clinical subgroups. A) Gelplot of mass spectra data from patients in the training set. A gelplot combining all the processed spectra of the training sets (Italian A, Japan A, and Japan B) grouped according to the response classes. The grayscale corresponds to intensity (darker is more). Note the strongly varying intensity around m/z = 11500. The position of some of the markers used is indicated by the arrows below the graph, blue arrow shows the m/z 11500 position. B) Inset shows enlarged, group-averaged mass spectra near m/z 11500. The spectra fall naturally into three groups: SD Long, SD Short/PR, and PD early/PD. SD Long and PD Early were used to define the classification.
- Supplementary Fig. 2 -
Cross-institutional variability in mass spectra features. A) The feature at m/z = 11446. B) The feature at m/z = 11903. Feature values (integrated spectral intensities hence dimensionless) from spectra obtained at UCDHSC and VU are shown as scatterplots. Note the logarithmic scale, which shows that the “good” values cluster around zero, whereas the “poor” values range over three orders of magnitude. The R-values for a linear dependence are 0.96 for (A) and 0.98 for (B).
- Supplementary Fig. 3 -
Scatterplots of correlation between serum and plasma in the intensities of the eight classification algorithm features. Data from 73 patients for which there were both serum and plasma samples available from the ECOG validation cohort. The values are integrated intensities (feature values) and are dimensionless.
- Supplementary Fig. 4 -
The results of a permutation analysis assessing the significance of the negative results from the Control Set Italian C. To estimate the probability of obtaining the P values observed in Italian C by chance we randomly sampled 500,000 sets of size 32 from the Italian B set. On the left we show the distribution of the value of the log-rank statistic, on the right we show the distribution of P values. The red lines indicate the corresponding values from the analysis of the 32 samples from the Italian C control set ( P value:.42, statistic: 0.652). The probability of obtaining these values by chance is given by the integrated tails (to the left for the value of the statistic and to the right for the p-values) and yields 6.64%. The spike at zero for the log-rank distribution is due to discretization effects when the expected number of deaths is equal to the observed number of deaths.