Skip Navigation

JNCI Journal of the National Cancer Institute 2005 97(4):307-309; doi:10.1093/jnci/dji008
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (129)
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Baggerly, K. A.
Right arrow Articles by Coombes, K. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Baggerly, K. A.
Right arrow Articles by Coombes, K. R.
Related Collections
Right arrowRelated Commentaries in JNCI
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2005 Oxford University Press

BRIEF COMMUNICATION

Signal in Noise: Evaluating Reported Reproducibility of Serum Proteomic Tests for Ovarian Cancer

Keith A. Baggerly, Jeffrey S. Morris, Sarah R. Edmonson, Kevin R. Coombes

Affiliations of authors: Department of Biostatistics, M.D. Anderson Cancer Center, Houston, TX (KAB, JSM, KRC); Department of Family and Community Medicine, Baylor College of Medicine, Houston, TX (SRE)

Correspondence to: Keith A. Baggerly, Department of Biostatistics, 1515 Holcombe Blvd., Box 447, Houston, TX 77030–4009 (e-mail: kabagg{at}mdanderson.org).


    ABSTRACT
 Top
 Notes
 Abstract
 References
 
Proteomic profiling of serum initially appeared to be dramatically effective for diagnosis of early-stage ovarian cancer, but these results have proven difficult to reproduce. A recent publication reported good classification in one dataset using results from training on a much earlier dataset, but the authors have since reported that they did not perform the analysis as described. We examined the reproducibility of the proteomic patterns across datasets in more detail. Our analysis reveals that the pattern that enabled successful classification is biologically implausible and that the method, properly applied, does not classify the data accurately. We show that the method used in previously published studies does not establish reproducibility and performs no better than chance for classifying the second dataset, in part because the second dataset is easy to classify correctly. We conclude that the reproducibility of the proteomic profiling approach has yet to be established.


Dramatic results (1) from proteomic profiling of serum by use of mass spectrometry have triggered hopes that this technology will provide a diagnostic test for ovarian cancer (2,3). In this approach, proteomic patterns—i.e., the joint intensities of several spectral peaks—are used to distinguish samples obtained from patients with ovarian cancer or from healthy individuals (1).

Diagnostic application of this ap-proach requires that patterns from previous studies suffice to classify new spectra. Proteomic patterns in general, however, have been difficult to reproduce (4). Zhu et al. (5) reported that, by use of a new method, a pattern derived from one ovarian cancer dataset accurately classified a second, blinded dataset produced many months later. These results are in contrast with suggestions that a systematic measurement offset between these particular datasets precludes reproducibility (6).

However, a programming error precluded the classification across datasets as reported. According to the reported method (5), the first dataset was split into training and test sets. A separating pattern of 18 peaks (at m/z values 167.8031, 321.42, 322.42, 359.63, 385.57, 413.17, 433.91, 434.69, 444.47, 445.26, 1222.18, 1528.34, 3345.80, 3349.15, 3473.31, 3528.53, 6101.63, and 6123.52) was identified in the training set and confirmed in the test set. Spectra from the second dataset were classified according to the identity of the five nearest neighbors [by Mahalanobis distance (7)] among the training spectra set from the first dataset. In reality, however, "the spectra in the second dataset were classified using a jack-knife approach where distances were computed between each spectrum and all of the other spectra in the second dataset, and the spectrum was classified according to the status of its five nearest neighbors in this set of spectra. Only the peak locations (m/z values) were retained across datasets, and these served to define the points at which the distances were computed. Further, the validation simulations used training sets drawn from the second dataset" (Wei Zhu, personal communication).

This creates a problem for the claim of reproducibility across datasets, because classification of the second dataset used knowledge of the status of spectra in the second dataset. Nonetheless, the separation achieved suggests that these 18 peak m/z values may be "important" for classification.

In this study, we investigate the biological plausibility of the reported m/z values for cancer diagnosis, and then we compute classification rates obtained with their reported values or with newly generated patterns. We then replicate the jack-knife approach, as described (Wei Zhu, personal communication), to classify the second dataset by use of the published m/z values listed above. Finally, we calculate the probability that the reported classification could occur by chance.

Zhu et al. (5) analyzed two publicly available datasets (8). The first dataset [Ovarian Cancer Dataset 4–3–02 (8)] contains 216 spectra, obtained from the serum of 100 cancer patients and 116 "unaffected" patients; 100 of the latter were obtained from healthy control subjects, and the remaining 16 were from patients with benign ovarian disease. The second dataset [Ovarian Cancer Dataset 8–7–02 (8)] contains 253 spectra, obtained from the serum of 162 cancer patients and 91 healthy control subjects. The Matlab code for all analyses is available (http://bioinformatics.mdanderson.org).

To address the biological plausibility of the 18 identified peaks, we computed two-sample t statistics by comparing all control samples with all cancer samples at each peak separately for the two datasets. If cancer-induced changes in protein expression are measurable at those m/z values, then the t statistics should have the same sign in both datasets.

To test the published classification method for the second dataset, we drew training sets of 50 cancer sample spectra and 50 unaffected control sample spectra from the first dataset. For each of the 18 reported peaks, we classified each spectrum in the second dataset as cancer or control by use of the 5 nearest neighbors in the training set. We repeated this procedure 1000 times.

Next, we used randomly chosen training sets from the first dataset to define peak sets by the approach described by Zhu et al. (5) and then used these patterns to classify the second dataset as above. As before, we repeated this procedure 1000 times.

Finally, after replicating the jack-knife classification of the spectra in the second dataset using the published m/z values (derived from the first dataset) listed above, we randomly chose sets of 18 peaks and classified the second dataset spectra by this jack-knife approach. We repeated this procedure 1000 times. We then repeated the jack-knife process, first with random peaks chosen from m/z values of less than 6000 and second with m/z values of less than 1000, to reflect the range of most of the original 18 peaks found by Zhu et al.

The signs of the t statistics changed between datasets for 13 of the 18 peaks (Fig. 1). A change in sign indicates that protein intensities at that point are higher in cancer spectra in one dataset and in control spectra for the other dataset. This reversal is not consistent with a persistent difference in protein expression between cancer samples and control samples.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 1. Summary of t statistics at 18 published peaks. Peaks have m/z values as indicated in the text. The t statistics represent the difference in spectral intensity between cancer and unaffected spectra for the 18 reported m/z values. Solid line = t statistic values from the first dataset; dashed line = t statistic values from the second dataset. The magnitude and sign of the t statistics correspond to the relative protein expression of cancer and normal spectra for the two datasets; a change in sign indicates that the average spectral intensity at that m/z value was greater in cancer spectra for one dataset and for control spectra in the other.

 
Results of simulations (Fig. 2) using the nearest neighbor method, as described by Zhu et al. (5), showed lower accuracy than was reported in that publication. In 893 of 1000 simulations using the published pattern derived from the first dataset, all 253 spectra were classified as cancer. This corresponds to a test with 100% sensitivity but 0% specificity. The highest overall accuracy observed (200 of the 253 spectra) was less than 80%. In 667 of 1000 simulations using patterns newly generated from random training sets, all 253 spectra were classified as control, and in another 218 of the 1000 simulations, all 253 spectra were classified as cancer. The highest overall accuracy observed (172 of the 253 spectra) was less than 70%.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 2. Classification accuracies observed in simulations. Box plots show the median and quartile accuracies observed for each simulation approach. Each simulation involved 1000 repetitions. Simulation methods are as follows: Method 1) Training sets randomly chosen from the first dataset were used to classify the second dataset according to the published method using the 18 m/z values listed in the text. The arrow indicates the median line, also the first and third quartiles, which coincide with the observed accuracy when all samples are classified as "cancer." Method 2) Training sets randomly chosen from the first dataset were used to generate new sets of m/z values, and these values were used to classify the second dataset according to thepublished method (5). The arrow points to the median line, also the first quartile, which coincides with the observed accuracy when all samples are classified as "control." Method 3) The second dataset was classified by use of the jack-knife approach; 18 m/z values were randomly chosen from the entire spectrum. Method 4) The second dataset was classified by use of the jack-knife approach; 18 m/z values were randomly chosen from values of less than 6000. All of the originally reported m/z values were less than 6000. Method 5) The second dataset was classified by use of the jack-knife approach; 18 m/z values were randomly chosen from values of less than 1000. Of the originally reported m/z values, 10 of the 18 values were less than 1000.

 
In our hands, application of the jack-knife approach to the second dataset, using the published m/z values (5), resulted in correct classification of 249 (98.42%) of the 253 spectra. Classification accuracy using the jack-knife approach with randomly chosen patterns was quite high; in fact, random values met or exceeded 98.4% classification accuracy 6% of the time using the whole spectrum, 14.8% of the time with random m/z values of less than 6000, and 56.2% of the time with random m/z values of less than 1000.

The pattern of protein expression is inconsistent between the datasets at the reported m/z values. Thus, these values apparently do not represent biologically important changes in cancer patients.

Further, neither the method outlined in Zhu et al. (5) nor the jack-knife method demonstrate reproducibility in thesedatasets. The former, theoretically a validtest of reproducibility, results in unacceptably poor classification. The latter does not diagnose new cases on the basis of the previous data only and results in classifications that are no better than chance.

The excellent classification achieved in the second dataset using random patterns suggests pervasive differences between cancer and control spectra. Changes in protein expression associated with cancer should affect only a few specific peaks, not the entire spectrum. Systematic differences in spectra are more likely associated with procedural bias, such as incomplete randomization, that confounds our ability to recognize potentially reproducible biological factors. Hence, reproduction of proteomic patterns across experiments remains an open question that, in our assessment, has not been answered with the two datasets investigated.


    NOTES
 Top
 Notes
 Abstract
 References
 
The authors thank Wei Zhu for useful discussions. JSM was supported in part by NIH-NCI RO1 CA 107304.


    REFERENCES
 Top
 Notes
 Abstract
 References
 

(1) Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 2002;359:572–7.[CrossRef][Web of Science][Medline]

(2) Correlogic Systems: Patterns for Life. http://www.correlogic.com. [Last accessed: November 22, 2004.]

(3) Pollack A. A new cancer test stirs hope and concern. New York Times February 3, 2004.

(4) Rogers MA, Clarke P, Noble J, Munro NP, Paul A, Selby PJ, et al. Proteomic profiling of urinary proteins in renal cancer by surface enhanced laser desorption ionization and neural-network analysis: identification of key issues affecting potential clinical utility. Cancer Res 2003;63:6971–83.[Abstract/Free Full Text]

(5) Zhu W, Wang X, Ma Y, Rao M, Glimm J, Kovach JS. Detection of cancer-specific markers amid massive mass spectral data. Proc Natl Acad Sci U S A 2003;100:14666–71.[Abstract/Free Full Text]

(6) Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum: comparing data sets from different experiments. Bioinformatics 2004;20:777–85.[Abstract/Free Full Text]

(7) Mardia K, Kent J, Bibby J. Multivariate analysis. New York (NY): Academic Press, Harcourt Brace Jovanovich; 1979.

(8) Clinical Proteomics Program Databank. http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp. [Last accessed: October 26, 2004.]

Manuscript received June 3, 2004; revised October 7, 2004; accepted November 1, 2004.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?

Related Commentaries in JNCI

Importance of Communication Between Producers and Consumers of Publicly Available Experimental Data
Lance A. Liotta, Mark Lowenthal, Arpita Mehta, Thomas P. Conrads, Timothy D. Veenstra, David A. Fishman, and Emanuel F. Petricoin, III
J Natl Cancer Inst 2005 97: 310-314. [Abstract] [Full Text] [PDF]

Lessons from Controversy: Ovarian Cancer Screening and Serum Proteomics
David F. Ransohoff
J Natl Cancer Inst 2005 97: 315-319. [Abstract] [Full Text] [PDF]



This article has been cited by other articles:


Home page
Clin. Cancer Res.Home page
J. M.G. Taylor, D. P. Ankerst, and R. R. Andridge
Validation of Biomarker-Based Risk Prediction Models
Clin. Cancer Res., October 1, 2008; 14(19): 5977 - 5983.
[Abstract] [Full Text] [PDF]


Home page
Brief Funct Genomic ProteomicHome page
M. W. Duncan, H. Roder, and S. W. Hunsucker
Quantitative matrix-assisted laser desorption/ionization mass spectrometry
Brief Funct Genomic Proteomic, September 1, 2008; 7(5): 355 - 370.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
S. S. Tworoger, D. Spentzos, F. T. Grall, T. A. Liebermann, and S. E. Hankinson
Reproducibility of Proteomic Profiles Over 3 Years in Postmenopausal Women Not Taking Postmenopausal Hormones
Cancer Epidemiol. Biomarkers Prev., June 1, 2008; 17(6): 1480 - 1485.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
A. Barla, G. Jurman, S. Riccadonna, S. Merler, M. Chierici, and C. Furlanello
Machine learning methods for predictive proteomics
Brief Bioinform, March 1, 2008; 9(2): 119 - 128.
[Abstract] [Full Text] [PDF]


Home page
Brief Funct Genomic ProteomicHome page
J. N. McGuire, J. Overgaard, and F. Pociot
Mass spectrometry is only one piece of the puzzle in clinical proteomics
Brief Funct Genomic Proteomic, February 28, 2008; (2008) eln005v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Harezlak, M. Wang, D. Christiani, and X. Lin
Quantitative quality-assessment techniques to compare fractionation and depletion methods in SELDI-TOF mass spectrometry experiments
Bioinformatics, September 15, 2007; 23(18): 2441 - 2448.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
F. K. Parekh and T. L. Richie
Characterization of Immune Reactivity Profiles Using Microarray Technology May Expedite Identification of Candidate Antigens for Next Generation Malaria Vaccines
Clin. Chem., July 1, 2007; 53(7): 1183 - 1185.
[Full Text] [PDF]


Home page
Clin. Chem.Home page
E. P. Diamandis
Oncopeptidomics: A Useful Approach for Cancer Diagnosis?
Clin. Chem., June 1, 2007; 53(6): 1004 - 1006.
[Full Text] [PDF]


Home page
Clin. Chem.Home page
M. F. Lopez, A. Mikulskis, S. Kuzdzal, E. Golenko, E. F. Petricoin III, L. A. Liotta, W. F. Patton, G. R. Whiteley, K. Rosenblatt, P. Gurnani, et al.
A Novel, High-Throughput Workflow for Discovery and Identification of Serum Carrier Protein-Bound Peptide Biomarker Candidates in Ovarian Cancer Samples
Clin. Chem., June 1, 2007; 53(6): 1067 - 1074.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
J. F. Timms, E. Arslan-Low, A. Gentry-Maharaj, Z. Luo, D. T'Jampens, V. N. Podust, J. Ford, E. T. Fung, A. Gammerman, I. Jacobs, et al.
Preanalytic Influence of Sample Handling on SELDI-TOF Serum Protein Profiles
Clin. Chem., April 1, 2007; 53(4): 645 - 656.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
D. Nedelkov, U. A. Kiernan, E. E. Niederkofler, K. A. Tubbs, and R. W. Nelson
Population Proteomics: The Concept, Attributes, and Potential for Cancer Biomarker Research
Mol. Cell. Proteomics, October 1, 2006; 5(10): 1811 - 1818.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
L. E. Moore, E. T. Fung, M. McGuire, C. C. Rabkin, A. Molinaro, Z. Wang, F. Zhang, J. Wang, C. Yip, X.-Y. Meng, et al.
Evaluation of apolipoprotein A1 and posttranslationally modified forms of transthyretin as biomarkers for ovarian cancer detection in an independent study population.
Cancer Epidemiol. Biomarkers Prev., September 1, 2006; 15(9): 1641 - 1646.
[Abstract] [Full Text] [PDF]


Home page
ANN INTERN MEDHome page
J. Fisher Wilson
The rocky road to useful cancer biomarkers.
Ann Intern Med, June 20, 2006; 144(12): 945 - 948.
[Full Text] [PDF]


Home page
Cancer Res.Home page
E. P. Diamandis
Serum Proteomic Profiling by Matrix-Assisted Laser Desorption-Ionization Time-of-Flight Mass Spectrometry for Cancer Diagnosis: Next Steps
Cancer Res., June 1, 2006; 66(11): 5540 - 5541.
[Full Text] [PDF]


Home page
Int J EpidemiolHome page
R. C Millikan
Commentary: The Human Genome: philosopher's stone or magic wand?
Int. J. Epidemiol., June 1, 2006; 35(3): 578 - 581.
[Full Text] [PDF]


Home page
Clin. Chem.Home page
E. P. Diamandis
Validation of breast cancer biomarkers identified by mass spectrometry.
Clin. Chem., April 1, 2006; 52(4): 771 - 772.
[Full Text] [PDF]


Home page
JCOHome page
L. Y. Han, C. N. Landen Jr, A. A. Kamat, A. Lopez, D. P. Bender, P. Mueller, R. Schmandt, D. M. Gershenson, and A. K. Sood
Preoperative Serum Tissue Factor Levels Are an Independent Prognostic Factor in Patients With Ovarian Carcinoma
J. Clin. Oncol., February 10, 2006; 24(5): 755 - 761.
[Abstract] [Full Text] [PDF]


Home page
Exp. Biol. Med.Home page
M. W. Duncan and S. W. Hunsucker
Proteomics as a Tool for Clinically Relevant Biomarker Discovery and Validation
Experimental Biology and Medicine, December 1, 2005; 230(11): 808 - 817.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
P. Findeisen, D. Sismanidis, M. Riedl, V. Costina, and M. Neumaier
Preanalytical Impact of Sample Handling on Proteome Profiling Experiments with Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry
Clin. Chem., December 1, 2005; 51(12): 2409 - 2411.
[Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
T. W. Randolph, B. L. Mitchell, D. F. McLerran, P. D. Lampe, and Z. Feng
Quantifying Peptide Signal in MALDI-TOF Mass Spectrometry Data
Mol. Cell. Proteomics, December 1, 2005; 4(12): 1990 - 1999.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
Z. Zhang and D. W. Chan
Cancer Proteomics: In Pursuit of "True" Biomarker Discovery
Cancer Epidemiol. Biomarkers Prev., October 1, 2005; 14(10): 2283 - 2286.
[Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
N. L. Anderson
The Roles of Multiple Proteomic Platforms in a Pipeline for New Diagnostics
Mol. Cell. Proteomics, October 1, 2005; 4(10): 1441 - 1444.
[Full Text] [PDF]


Home page
Clin. Chem.Home page
M. F. Lopez, A. Mikulskis, S. Kuzdzal, D. A. Bennett, J. Kelly, E. Golenko, J. DiCesare, E. Denoyer, W. F. Patton, R. Ediger, et al.
High-Resolution Serum Proteomic Profiling of Alzheimer Disease Samples Reveals Disease-Specific, Carrier-Protein-Bound Mass Signatures
Clin. Chem., October 1, 2005; 51(10): 1946 - 1954.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
A. Karsan, B. J. Eigl, S. Flibotte, K. Gelmon, P. Switzer, P. Hassell, D. Harrison, J. Law, M. Hayes, M. Stillwell, et al.
Analytical and Preanalytical Biases in Serum Proteomic Pattern Analysis for Breast Cancer Diagnosis
Clin. Chem., August 1, 2005; 51(8): 1525 - 1528.
[Full Text] [PDF]


Home page
JCOHome page
M. Goggins
Molecular Markers of Early Pancreatic Cancer
J. Clin. Oncol., July 10, 2005; 23(20): 4524 - 4531.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
L. A. Liotta, M. Lowenthal, A. Mehta, T. P. Conrads, T. D. Veenstra, D. A. Fishman, and E. F. Petricoin III
Importance of Communication Between Producers and Consumers of Publicly Available Experimental Data
J Natl Cancer Inst, February 16, 2005; 97(4): 310 - 314.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
D. F. Ransohoff
Lessons from Controversy: Ovarian Cancer Screening and Serum Proteomics
J Natl Cancer Inst, February 16, 2005; 97(4): 315 - 319.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (129)
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Baggerly, K. A.
Right arrow Articles by Coombes, K. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Baggerly, K. A.
Right arrow Articles by Coombes, K. R.
Related Collections
Right arrowRelated Commentaries in JNCI
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?