Skip Navigation

JNCI Journal of the National Cancer Institute 2007 99(9):664-668; doi:10.1093/jnci/djk189
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Tuma, R. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tuma, R. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© Oxford University Press 2007.

NEWS

Statisticians Set Sights on Observational Studies

Rabiya S. Tuma

The reliability of results from observational studies has been called into question many times in the recent past, with several analyses showing that well over half of the reported findings are subsequently refuted. In an effort to improve the quality of epidemiological studies and their design, an international group is developing publication guidelines for observational trials, similar to the CONSORT guidelines that were adopted for randomized trials.

The proposed guidelines, called STROBE (STrengthening the Reporting of OBservational studies in Epidemiology), will provide a checklist that authors and journal editors can use to ensure that the necessary information has been included in a manuscript. The checklist will include both basic study design information, such as stating the specific objective and prespecified hypotheses, and more complex elements regarding the handling of quantitative variables and statistical methods.

"The [current] standards for reporting epidemiological studies are not that high," said Stuart Pocock, Ph.D., professor of medical statistics at the London School of Hygiene and Tropical Medicine, who has been involved in the STROBE effort. "Previously, there was no document that has had international recognition to say, ‘These are the issues that you need to get right when you are trying to publish an epidemiological study.’"

Study authors don't describe with enough detail what they did and why, he said. They also don't report the results in a way that clearly separates their primary intent from subsequent analyses. "By tightening up the reporting, it will also feed back and tighten up how they design their studies in the first place." The hope is that better trial design will improve other researchers’ ability to reproduce the results.


Figure 1
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
With widely-used methods, Peter Austin, Ph.D., found that Leos were more likely to have gastrointestinal bleeding, while Sagittarians were more likely be hospitalized for a broken arm. But when the proper P value was used, the associations faded away.

 
The STROBE contributors—epidemiologists, journal editors, and medical statisticians—are still working on the final version of the statement but hope to publish it in several journals later this year. The current draft is available for comment at http://www.strobe-statement.org.

"I think CONSORT has improved things," Pocock said. "What made it possible is most of the leading medical journals have bought into CONSORT. That, plus the general scientific community's recognition that standards were needed, has allowed CONSORT to make an impact. We are optimistic that STROBE will go a similar way."

State of Observational Studies

Such improvements are needed. John Ioannidis, M.D., professor and chair of the University of Ioannina School of Medicine in Greece, and colleagues have published a series of papers looking at the quality of the scientific literature, including observational studies. In one analysis, Ioannidis examined what factors contribute to whether a study's findings are reproducible. On the basis of that information, he estimates that only about 20% of adequately powered epidemiology studies aimed at uncovering previously unknown associations and generating new hypotheses are likely to be true, even if they show a statistically significant result. In a separate analysis, his team found that the main findings of only one of six highly cited observational studies could be replicated; the conclusions from the other five were refuted.

"The combination of selective reporting and multiplicity of analysis undermines the credibility of epidemiological studies," Ioannidis said at the annual meeting of the American Association for the Advancement of Science in San Francisco.

Selective reporting, also called publication bias, comes about partly because so many epidemiological studies are designed to generate new ideas (exploratory trials), not test specific hypotheses. The authors analyze the data and then decide what to publish, focusing on the statistically significant results—without mentioning other factors that were tested but that failed to show a significant association. This type of data mining can uncover potentially interesting associations, but the authors need to fully report how they discovered the relationship and explicitly state that the results need to be reproduced in a study designed to examine that hypothesis, Pocock said.

"The epidemiological literature by itself is a literature that practically has ubiquitous statistically significant findings," Ioannidis said. In a survey of 389 papers reporting the results of observational studies, his research group found that 88% included at least one statistically significant positive association in the abstract, while only 43% reported a nonsignificant association. This finding is despite the fact that nearly all associations are not expected to be statistically significant. That means there are a lot of nonsignificant associations that are not being reported, despite having been tested.

"In the meta-analysis literature, where one pools findings across studies, the problem is called the file drawer problem," said Peter Austin, Ph.D., a senior scientist at the Institute for Clinical Evaluative Sciences in Toronto. "Researchers try to estimate how many unpublished studies are sitting in people's file drawers—or hard drives, now—that are nulls. The assumption is that there are a bunch of unpublished studies out there."

The problem is that the average reader of scientific literature or the popular press is unlikely to consider such issues when reading a title proclaiming that a certain food is associated with a given disease. The burden is thus on researchers and journal editors to publish studies that do not find a sexy new association as well as those that do. "The journals can have a central role in improving the accuracy and transparency of the reported information and the avoidance of selective reporting biases," Ioannidis said. "It is often debated whether it is the fault of the journal editors, peer reviewers, or authors. I think this is a pseudodebate. We, as scientists, are the editors, peer reviewers, and authors, wearing different hats on different occasions. So, it is our own responsibility to make things better."

The Problem of P values

Another factor that often leads to false-positive results is that too many hypotheses are being tested simultaneously. To illustrate the problem, Austin looked for an association between an individual's astrological sign and the likelihood of hospitalization for a particular medical problem. And though the question is biologically implausible, his approach parallels that used in many exploratory studies.

First, Austin split his sample of more than 10 million individuals in the Ontario government health databases into a test set and a validation set. Then he looked in the test set for two ailments per astrological sign that occurred significantly more often for individuals born under that sign than those born under the 11 others. When he retested those 24 hypotheses against his validation set, two were statistically significant when the standard P-value cutoff of .05 was used: Leos were significantly more likely to suffer from gastrointestinal bleeding than others (P = .048), and Sagittarians were more likely to be hospitalized because of a broken arm than others (P = .0125).

"If you keep looking, eventually you will find an association.," Austin said.

But Austin's analysis wasn't quite right. If the researcher wants to maintain a false-positive rate of just 5% (which is what a P value of .05 signifies in a single-hypothesis test), then the number of hypotheses being tested needs to be taken into account when setting the P-value cutoff for significance. When 24 hypotheses are tested, an association would have to have a P value less than .00213 to be statistically significant. Not surprisingly, with that boundary all 24 hypotheses were rejected in the astrological test.

Adjusting the significance boundary only partially solves the problem, though. If too many hypotheses are tested, adjusting the P value is likely to wipe out all the associations, even those that have a true biological effect. A better approach is for researchers to look for biological plausibility before the study begins and resources are on the line, Pocock said.

"Sharpening the mind rather than doing a post-hoc adjustment of P values is a better way to go," he said. He hopes that STROBE will encourage such forethought and transparency.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Tuma, R. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tuma, R. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?