Skip Navigation


Journal of the National Cancer Institute Advance Access originally published online on November 13, 2007
JNCI Journal of the National Cancer Institute 2007 99(22):1715-1723; doi:10.1093/jnci/djm216
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
99/22/1715    most recent
djm216v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Lusa, L.
Right arrow Articles by Pierotti, M. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lusa, L.
Right arrow Articles by Pierotti, M. A.
Related Collections
Right arrowRelated Article in JNCI
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press.

ARTICLES

Challenges in Projecting Clustering Results Across Gene Expression–Profiling Datasets

Lara Lusa, Lisa M. McShane, James F. Reid, Loris De Cecco, Federico Ambrogi, Elia Biganzoli, Manuela Gariboldi, Marco A. Pierotti

Affiliations of authors: Department of Experimental Oncology (LL, JFR, LDC, MG, MAP) and Unit of Medical Statistics and Biometry (EB), Fondazione IRCCS (Istituti di ricovero e cura a carattere scientifico) Istituto Nazionale dei Tumori, Milano, Italy; Molecular Genetics of Cancer Group, IFOM Fondazione Istituto FIRC (Fondazione Italiana per la Ricerca sul Cancro) di Oncologia Molecolare, Milano, Italy (LL, JFR, LDC, MG, MAP); Biometric Research Branch, National Cancer Institute, Bethesda, MD (LMM); Institute of Medical Statistics and Biometry, Università degli Studi di Milano, Milano, Italy (FA)

Correspondence to: Lara Lusa, PhD, IFOM Fondazione Istituto FIRC di Oncologia Molecolare, Via Adamello, 16 I-20139 Milano, Italy (e-mail: lara.lusa{at}ifom-ieo-campus.it).

Background: Gene expression microarray studies for several types of cancer have been reported to identify previously unknown subtypes of tumors. For breast cancer, a molecular classification consisting of five subtypes based on gene expression microarray data has been proposed. These subtypes have been reported to exist across several breast cancer microarray studies, and they have demonstrated some association with clinical outcome. A classification rule based on the method of centroids has been proposed for identifying the subtypes in new collections of breast cancer samples; the method is based on the similarity of the new profiles to the mean expression profile of the previously identified subtypes.

Methods: Previously identified centroids of five breast cancer subtypes were used to assign 99 breast cancer samples, including a subset of 65 estrogen receptor–positive (ER+) samples, to five breast cancer subtypes based on microarray data for the samples. The effect of mean centering the genes (i.e., transforming the expression of each gene so that its mean expression is equal to 0) on subtype assignment by method of centroids was assessed. Further studies of the effect of mean centering and of class prevalence in the test set on the accuracy of method of centroids classifications of ER status were carried out using training and test sets for which ER status had been independently determined by ligand-binding assay and for which the proportion of ER+ and ER– samples were systematically varied.

Results: When all 99 samples were considered, mean centering before application of the method of centroids appeared to be helpful for correctly assigning samples to subtypes, as evidenced by the expression of genes that had previously been used as markers to identify the subtypes. However, when only the 65 ER+ samples were considered for classification, many samples appeared to be misclassified, as evidenced by an unexpected distribution of ER+ samples among the resultant subtypes. When genes were mean centered before classification of samples for ER status, the accuracy of the ER subgroup assignments was highly dependent on the proportion of ER+ samples in the test set; this effect of subtype prevalence was not seen when gene expression data were not mean centered.

Conclusions: Simple corrections such as mean centering of genes aimed at microarray platform or batch effect correction can have undesirable consequences because patient population effects can easily be confused with these assay-related effects. Careful thought should be given to the comparability of the patient populations before attempting to force data comparability for purposes of assigning subtypes to independent subjects.



CONTEXT AND CAVEATS

Prior knowledge

Microarray data on the expression of multiple genes in a given sample have been used to classify breast and other cancers into subtypes that are associated with different clinical outcomes. A method had been proposed (the method of centroids) for assigning new samples to these subtypes based on the similarity of their expression profile to the mean expression profile of the previously identified subtypes.

Study design

New samples for which there was prior information on estrogen receptor status were assigned to previously identified breast cancer subtypes using the method of centroids, and the effect of subtype prevalence and systematic differences across datasets on assignment was assessed.

Contribution

This study identified a number of factors that can influence the accuracy of assignment of patient samples to previously identified cancer subtypes.

Implications

Careful consideration must be given to the comparability of patient populations and datasets in assigning samples to previously identified subtypes.

Limitations

A robust classification rule for assigning new samples that are not part of the original dataset from which the clusters were derived remains elusive.

 
Manuscript received December 5, 2006; revised September 7, 2007; accepted October 1, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?

Related Article in JNCI

IN THIS ISSUE
J Natl Cancer Inst 2007 99: 1653. [Extract] [Full Text] [PDF]



This article has been cited by other articles:


Home page
JNMHome page
W. A. Weber
Assessing Tumor Response to Therapy
J. Nucl. Med., May 1, 2009; 50(Suppl_1): 1S - 10S.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
M. A. Troester, R. C. Millikan, and C. M. Perou
Microarrays and Epidemiology: Ensuring the Impact and Accessibility of Research Findings
Cancer Epidemiol. Biomarkers Prev., January 1, 2009; 18(1): 1 - 4.
[Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.