Skip Navigation

When Are “Positive” Clinical Trials in Oncology Truly Positive?

  1. Ian F. Tannock
  1. Affiliations of authors: Medical Oncology Department-AECC unit, Albacete University Hospital, Albacete, Spain (AO); Division of Medical Oncology and Hematology, Princess Margaret Hospital and University of Toronto, Toronto, ON, Canada (IFT)
  1. Correspondence to:
    Ian F. Tannock, MD, PhD, Division of Medical Oncology and Hematology, Princess Margaret Hospital, 610 University Ave, Toronto, ON, Canada M5G 2 M9 (e-mail: ian.tannock{at}uhn.on.ca).
  • Received April 21, 2010.
  • Revision received October 8, 2010.
  • Accepted October 22, 2011.

Abstract

The approval of a new drug for cancer treatment by the regulatory authorities, such as the United States Food and Drug Administration or European Medicines Agency, is usually based on the positive results of one or more randomized phase III clinical trials comparing the investigational treatment with the standard treatment. A clinical trial is presented as positive if the new drug tested on an experimental group shows a statistically significant difference with the control group (P < .05) in the primary endpoint, which is usually a time-to-event endpoint (overall survival or progression-free survival). Such apparently positive clinical trials disregard whether the final value of the difference in the primary endpoints between the experimental and control groups (δ) meets the criterion that was predefined in the protocol. Currently, the trend is to design large trials that may detect statistically significant, but often trivial, differences in survival endpoints. However, recent appeals have been made in the oncology literature for the design of smaller clinical trials to detect or exclude only larger, clinically important, values of δ. Here, we have evaluated 18 randomized phase III clinical trials that were used for the approval of molecular-targeted anticancer drugs by the United States Food and Drug Administration. Results showed that in some of the articles the magnitude of the reported values of δ were lower than the values predefined in the protocol. We suggest that trials should not be declared positive based only on a statistically significant P value, but should also require detection of a difference in survival outcome that equals or exceeds a clinically important value that is specified in the protocol.

Recent commentaries in the oncology literature have argued that randomized phase III clinical trials should be designed to detect only substantial clinically important differences in endpoints such as overall survival (OS) or progression-free survival (PFS) between the experimental and control groups (1). We support this argument, and also question the use of the word “positive” to describe a clinical trial that failed to demonstrate a difference in OS or PFS between the two groups that was specified in the protocol, regardless of the statistical significance of the result. Here, we review published articles that report randomized phase III clinical trials whose results were used to support the approval of new molecular-targeted anticancer drugs for treatment of solid tumors between January 1, 2000, and March 31, 2010. We have determined whether or not these trials detected a difference in outcome between the experimental and control groups that was equal to or greater than the value predefined in the protocol.

Identification of Clinical Trials Used for Approval of New Drugs

We reviewed the United States Food and Drug Administration (FDA) website (http://www.accessdata.fda.gov/scripts/cder/drugsatfda/index.cfm?fuseaction=Search.Addlsearch_drug_name) to identify all new molecular-targeted drugs approved by the FDA between January 1, 2000, and March 31, 2010, for the treatment of metastatic adult solid tumors. We used the information in the drug label to identify the clinical trials used for approval of a particular drug and selected only those drugs whose approval was based on phase III randomized clinical trials. We identified 10 new approved molecular-targeted agents and 18 randomized phase III clinical trials (219). We excluded trials in the adjuvant setting because the endpoints and benefits of adjuvant drug treatments are evaluated in a different manner. We only included trials that evaluated time-to-event endpoints such as OS, PFS, or time to progression. We reviewed each published trial to obtain information about the main results of the study as well as the endpoints used, including the predefined difference in primary endpoint (expressed usually as a hazard ratio [HR]) that the trial was designed to detect or exclude (Table 1). If the article did not provide information about the predefined difference in outcome between the experimental and control groups, we contacted the authors to request this information.

Table 1

Characteristics of phase III randomized clinical trials used by the United States Food and Drug Administration for the approval of new molecular-targeted drugs since 2000*

Selection of Appropriate Endpoints

The approval of new drugs by regulatory authorities, such as FDA or European Medicines Agency (EMEA) is based on positive results, which means usually a statistically significant difference of the primary endpoint between the experimental and control groups, reported from the randomized phase III clinical trials (20). The trials are designed to demonstrate an increase in efficacy of the investigational drug in the experimental group compared with the standard treatment in the control group. The measure of efficacy of a drug in a randomized phase III clinical trial should reflect patient benefit, essentially either increased duration of OS or a measure of the quality of life during the period of survival. Other primary outcomes, such as PFS, could also be used to measure efficacy of the drug if 1) PFS is shown to be a valid surrogate for OS; 2) in trials, especially placebo-controlled trials, when there is crossover because of early demonstration of activity of the new treatment and lack of alternatives; and/or 3) when there is long survival after disease progression during the trial, with the opportunity for multiple interim treatments, which makes the detection of a difference in OS difficult (2123).

Defining the Magnitude of Clinical Benefit

When designing a randomized phase III clinical trial, the investigators must specify in the protocol the difference (δ) in the primary endpoint between experimental and control groups that they aim to detect or exclude (24). The number of patients to be recruited and the duration of the study will depend on the value of δ; increasing the sample size will allow the detection or exclusion of smaller values of δ. Ideally, trials should be designed such that δ represents the minimum clinically important difference, taking into account the tolerability and toxicity of the new treatment, that would persuade oncologists to adopt the new treatment in place of the standard treatment. Of course, the opinions of oncologists as to what constitutes a minimal important value of δ will vary, but a reasonable consensus can be reached by seeking the opinions of oncologists who manage a given type of cancer. For example, an increase in median survival by less than 1 month for patients with advanced-stage cancer would not be regarded by most as clinically important, unless the new agent had less toxicity than standard treatment, whereas an improvement of median survival by greater than 3 months for a drug that was reasonably well tolerated would usually be accepted as clinically important. Moreover, questions to practitioners to determine the value of δ that should be used should be framed in terms of absolute differences (eg, differences in median survival) that are easy to comprehend. Previous research has shown that apparently large differences in hazard ratios may lead to adoption of a new treatment, whereas smaller differences in median survival do not, even when they are different expressions of the same clinical result (25).

The opinion of oncologists on what constitutes a meaningful benefit should be placed in the context of the opinion of patients and families about what is meaningful for them, and in the context of what can be reasonably afforded. Many patients who have late-stage incurable cancer may accept a lower clinical benefit of a new treatment, for a given risk of side effects and toxicity, compared with patients who have earlier stage disease and other options for treatment (26). From the societal perspective, all new targeted drugs are very expensive, and for most of them the cost per life-year gained falls outside what have been judged as estimates of maximum cost per life-year gained (typically approximately US $100 000) (27,28) that can reasonably be supported by Western economies. Publicly financed health care will generally support only larger levels of clinical benefit that fall within this upper limit of cost-effectiveness.

How Important Are Statistically Significant P Values?

Regardless of the value of δ that is set for a given clinical trial, results can be statistically significant (P ≤ .05) even when the best estimate of the difference in the primary endpoint between the experimental and control groups of the trial is smaller than the predefined value of δ. Estimating sample size is always difficult and depends on several assumptions. For example, an apparently statistically significant difference smaller than the predefined value of δ may occur if the variability in outcome (OS or PFS) is less than that observed in the phase II clinical trials that are often used to estimate the required sample size for randomized phase III clinical trials.

In Table 1, we have listed the recent randomized phase III clinical trials that have evaluated molecular-targeted drugs for treatment of metastatic cancer and provided the major evidence necessary to support an FDA recommendation for approval of the drug. We indicated the sample size, primary endpoint, the predefined value of δ (if it was provided in the publication or obtained by contacting the authors), as well as the best estimate of the difference in the primary endpoint between the experimental and control groups obtained from the results of the trial.

As shown in Table 1, several trials showed a statistically significant difference in a major outcome measure between the experimental and control groups, but the difference in outcome was of lower magnitude (eg, hazard ratio was closer to one) than that specified in the protocol. For example, the clinical trial that led to approval of erlotinib for treatment of pancreatic cancer was designed to detect a relative risk reduction of 25% (HR ≤ 0.75), but the best estimate of hazard ratio from the trial showed a relative risk reduction of 18% (HR = 0.82, 95% confidence interval = 0.69 to 0.99). The difference was statistically significant (P = .038), but the median survival differed by only 10 days (10). Likewise, the randomized phase III clinical trial evaluating sorafenib in metastatic renal cell carcinoma was designed to detect a 33% risk reduction (HR = 0.67), but the reported hazard ratio showed a 28% risk reduction (HR = 0.72, 95% confidence interval = 0.54 to 0.94, P = .02) (15). Similar discrepancies can be found in the randomized phase III clinical trial of sorafenib for treatment of advanced hepatocellular carcinoma (11), in the randomized phase III clinical trial evaluating letrozole (with or without lapatinib) in hormone receptor–positive and HER2-positive breast cancer (3), and in the randomized phase III clinical trial evaluating cetuximab in refractory metastatic colon cancer (9). Of note, crossover was permitted only in the phase III trial evaluating sorafenib for the treatment of metastatic renal cell carcinoma (15).

The futility of relying on P values can be seen by recognizing that any trivial but statistically significant, difference between the control and experimental groups would be detected by doing a large enough trial. Some studies had their design changed while they were in progress: for example, the original design of the study of bevacizumab as first-line treatment for metastatic lung cancer was modified to increase the sample size in the light of incoming results, thereby allowing it to achieve a statistically significant predefined relative risk reduction of 20% (14).

Minimum Difference in OS or PFS and Clinical Benefit to Patients

Because almost all phase III trials designed to evaluate new drugs are sponsored by pharmaceutical companies, the companies usually make the final decision about setting the δ value for any given trial. As part of the business strategy to maximize financial gains, the value of δ that pharmaceutical companies establish is not usually the minimal difference in OS or PFS that is clinically important, but more likely the minimal difference that is feasible to detect, considering the limits on the sample size and hence the cost of the trial. The difference will be as close as possible to the smallest difference in outcome that the FDA and other regulatory authorities are likely to accept to allow a company to register the new drug and make a profit by marketing it. Historically, any difference in OS that is statistically significant has been accepted by the FDA for this purpose (20).

Consistent with a recent commentary suggesting the need to increase the value of δ in future clinical trials (1), we provide an estimate of δ that would be generally accepted as representing a minimum clinically important difference in the primary endpoint: approximately 3 months increase in median OS for patients with advanced metastatic solid tumors (usually corresponding to an hazard ratio of approximately 0.75). A discussion on whether PFS is an appropriate primary endpoint for drug registration in any particular trial is beyond the scope of this commentary, but for trials in which PFS was the primary endpoint, we have suggested a more conservative minimal clinically important difference (4–6 months or an HR of approximately 0.5) for PFS. We also recognize that what are regarded as clinically significant treatment differences may differ among investigators and clinicians, and can also change over time.

From Table 1, it is evident that some, but not all, of the trials used to register new agents need a revised definition of being positive, which means showing a statistically significant difference in OS or PFS that is also clinically important. At least for some clinical trials, the reported hazard ratio was higher than the predefined value, and several studies did not even report the predefined difference in the primary endpoint between the control and experimental groups (ie, the predefined value of δ) (Table 1).

Finally, it is important to ask whether the toxic effects from use of a new drug may counterbalance improvement in time-to-event endpoints, especially for trials that show an improvement in PFS, but not in overall survival, and where there is no documentation of improved quality of life. For example, lapatinib was associated with a 4-month increase in PFS for women with HER2-positive metastatic breast cancer with no evaluation of overall quality of life and with increased diarrhea and rash in the investigational group (2).

Incorporation of validated predictive markers in the design of clinical trials could help determine the subgroup of patients most likely to benefit from a specific treatment, allowing detection of larger differences in outcome with fewer patients (29).

What Constitutes a Positive Clinical Trial in Oncology?

We would define a positive trial as one in which the predefined value of δ represents a clinically important difference in an endpoint that directly reflects benefit (mainly OS or quality of life) to patients and for which the results provide a best estimate of the difference that exceeds that predefined value of δ. Although it is important that trials be large enough to establish that the difference between the control and experimental groups meets conventional levels of statistical significance, and to provide reasonable confidence intervals around the difference detected, we argue that a statistically significant P value alone would not establish the positivity of a trial.

The primary focus of pharmaceutical companies is to generate profits, so they cannot be expected to modify the design or presentation of the trials as long as current criteria allow them to register new drugs with the FDA or EMEA and market them. However, regulatory authorities such as the FDA and EMEA could modify their criteria for drug registration, which would influence how clinical trials are designed. We suggest that they should define what constitutes a positive trial based on the concept of establishing a meaningful clinical benefit for patients similar to those included in any given trial (1). Establishing a clinically relevant and larger value of δ would have the added advantage that trials would be smaller and cheaper, that fewer patients would be recruited to trials to evaluate drugs that are likely to show trivial improvements in outcome, and that more patients would be available to participate in evaluating other new drugs, some of which might lead to more substantial improvements in outcome. The high cost of new medications adds weight to the requirement that studies should be designed to detect only important clinical benefits. A positive study should be based on a meaningful increase in survival or in quality of life for the patient, or both, but not on a statistically significant P value.

Funding

This study was not supported by any research funding.

Footnotes

  • The authors participated equally in the study design, analysis and interpretation of the data, and writing of the manuscript. The authors also read and approved the final manuscript.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
| Table of Contents