Skip Navigation


Journal of the National Cancer Institute Advance Access originally published online on July 8, 2008
JNCI Journal of the National Cancer Institute 2008 100(14):983-987; doi:10.1093/jnci/djn248
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
100/14/983    most recent
djn248v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Savage, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Savage, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© Oxford University Press 2008.

NEWS

CSI: BIOINFORMATICS

Forensic Bioinformatician Aims To Solve Mysteries of Biomarker Studies

Liz Savage

Keith Baggerly, Ph.D., is not a detective in the traditional sense—he doesn’t solve mysteries or fight crime. But his job does involve digging through data to get to the truth. Instead of reconstructing crime scenes, Baggerly spends part of his time reconstructing scientific analyses, a new field that many call forensic bioinformatics.


Figure 1
View larger version (124K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Keith Baggerly Ph.D.

 
In recent years, the new "-omics" sciences, including proteomics and genomics, have excited researchers with their potential for revealing the secrets of cancer. Yet they have also brought a new level of complexity to the cancer field, and with it concerns about validation and reproducibility. And that's where Baggerly comes in.

To correctly apply the new genomic findings to the patients at hand, the analyses that produced these findings must be understood in detail. Thus, for the last few years Baggerly and his colleagues at the University of Texas M. D. Anderson Cancer Center in Houston have tackled the difficult job of reproducing some complex analyses. Using the raw data and results from a study, they try to reconstruct how the original researchers conducted their analysis and produced the published findings. The task is not always straightforward. Often there is not enough information in the original report to figure out how the researchers got their results, and sometimes, even after intermediate results are obtained, it is still unclear how the researchers put them all together.

"We’re trying to piece together whether [the results] make sense, and if they don’t make sense, [we ask] how they might nonetheless have come about. That's something that we’ve had a good deal of trouble figuring out in some instances," Baggerly said. "And every once in while we get a situation where we look at the data [after reproduction] and say, ‘Actually, we’re not convinced that this does work.’"

Baggerly's most publicized case involved an article published in 2002 in The Lancet. The authors had presented a new blood test that they said could determine with nearly 100% accuracy whether a woman had ovarian cancer by analyzing a pattern of protein expression. The prospect of a noninvasive ovarian cancer screening test elicited much excitement because there is currently no way to detect ovarian cancer in its early stages. Nonetheless, there was concern over the reproducibility of the results, and the case encouraged debate about the validation of proteomic studies more generally.

M. D. Anderson investigators, like many others, wanted to try out the new test for themselves, so they enlisted Baggerly and his colleagues to show them how to reproduce the results. But they couldn’t, and ultimately the team determined that the apparent difference between cancerous and normal tissues had nothing to do with the biology of the samples. Rather, they concluded, it was due to problems in the trial design: All the cancerous specimens were run on one day and all the controls on another, rather than in random order. "There was a screw-up in experimental design, and that was one of the things that was driving the final results," Baggerly said.

Such examples have highlighted the danger of acting on results that have not been validated. "A lot of these findings that he's reported on ... are things that people are ready to put into clinical trials if they believe the results," said Lisa McShane, Ph.D., a statistician at the National Cancer Institute. "I think that it's important for people to realize that incorrect molecular profiling results [for example] can be as dangerous as bad treatments. It's so critical that someone really take a close look at these things."

Although Baggerly's colleagues are quick to praise him, they also point out that, in an ideal world, he wouldn’t be a forensic statistician—no one would. His work, though important, points to a greater problem in how science is conducted and presented. "I think Keith is doing a wonderful and needed job because he's uncovering potential problems that need to be anticipated and avoided," said David Ransohoff, M.D., professor of medicine at the University of North Carolina at Chapel Hill. "But the fact that we need people like him means that our journals are failing us. The kinds of things that Keith spends time finding out—what did [the researchers] actually do—that's what methods and results are supposed to be for in journals. ... We have to figure out how to do science without needing people like Keith."

Some of the problems that Baggerly regularly encounters, those barriers to reproducibility such as incomplete methods sections, are caused by the limited space in journals and could be resolved by publishing an expanded methods section online, for example. But other problems are not so easily addressed. The high-throughput microarray and proteomic studies that Baggerly analyzes are substantially more complex than the typical single-gene or -protein study. These complex analyses require a multidisciplinary approach, with researchers collaborating across fields and long distances. Unfortunately, few reviewers have the expertise to understand and critically examine the validity of such a wide-ranging study in its entirety. "There is this tendency to say, ‘Well, I understand the analysis so the biology is probably OK,’ or ‘The biology sounds plausible so the analysis must be OK,’" Baggerly said. "In many cases the full steps of the analyses are rarely understood by one person. And as a result a whole lot of confusion and ambiguity can slip in."

However, the reanalyses that Baggerly and his colleagues perform require something that few reviewers or readers have: time. "To actually go into the detail that Keith does is not something that the average reviewer has time to do," McShane said. "So we’ve sort of been forced into a situation where, unfortunately, if the results are artifacts and won’t hold up, we don’t find out until the paper's already published and someone like Keith—and I honestly don’t know of too many other people like Keith—does the reanalysis."

To address this time crunch faced by even the most diligent reviewer, Baggerly and his colleagues have been making checklists for authors and reviewers that address the most common problems. "There are a whole bunch of really basic things, but we’re writing them down explicitly because those are the questions that we’re finding that we’re really needing to ask over and over and over," he said.

What is the number-one problem that he encounters? Bookkeeping. "It's not sexy, it's not higher mathematics. It's bookkeeping ... keeping track of the labels and keeping track of what goes where," Baggerly said. "The thing that we have found repeatedly in our own analyses is that it actually is one of the most difficult steps in performing some of these analyses."

Just a simple mistake such as mislabeling a cell line as sensitive to a certain drug instead of resistant could have serious consequences. "I’m not really worried about particularly esoteric mathematics. Most of the stuff that I’m worried about is very clearly understanding what was done at each of the steps involved."

Baggerly hopes that others take his message seriously, but his influence is already apparent to some. "One thing I’ve learned [from him] is to document, document, document every step of the analysis so that I wouldn’t feel nervous if someone handed data that I had analyzed to Keith Baggerly. He may disagree with the analysis method I chose, but he can reproduce it," McShane said.

As Baggerly said, "Only if the data are reproducible can you talk about whether they’re right."


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
100/14/983    most recent
djn248v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Savage, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Savage, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?