Journal of the National Cancer Institute Advance Access originally published online on September 25, 2007
JNCI Journal of the National Cancer Institute 2007 99(19):1422-1423; doi:10.1093/jnci/djm167
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Published by Oxford University Press 2007.
EDITORIALS |
Early Average Change in Tumor Size in a Phase 2 Trial: Efficient Endpoint or False Promise?
Affiliations of authors: Biometric Research Branch (LVR, ELK) and Cancer Therapy Evaluation Program (JED, MAS, JJW), National Cancer Institute, Bethesda, MD
Correspondence to: Larry V. Rubinstein, PhD, National Cancer Institute, National Institutes of Health, Biometric Research Branch, Executive Plaza North, Rm. 8130, MSC-7434, Bethesda, MD 20892-7434 (rubinsteinl{at}ctep.nci.nih.gov).
Advances in our understanding of tumor biology, in pharmaceutical development, and in imaging technology have led to vigorous discussions about the most appropriate designs and endpoints for phase 2 trials (1). The need for these discussions is occasioned in part by novel agents whose predominant effect is to delay tumor progression and new imaging technologies that have increased the accuracy with which tumor size can be assessed and have introduced new ways of assessing tumor metabolism, perfusion, and necrosis. Further grounds for reassessment of trial design is the fact that it has long been acknowledged that tumor objective response, as defined by traditional response criteria, may sometimes be a poor surrogate for predicting patient benefit.
The proposal by Karrison et al. (2) in this issue of the Journal is a novel and intriguing solution to some of the problems facing investigators designing phase 2 trials involving new anticancer agents in this current environment. The authors propose the use of a randomized design with average change in tumor size (at 8 weeks, in their central example) as the endpoint and describe a trial that they will undertake using this design. As the authors state, it is now relatively common for new, molecularly targeted agents to be tested early in their development in combination with established chemotherapy regimens. The effects of such agents may be primarily cytostatic, which presents investigators with new challenges that make the standard one-armed phase 2 design—which uses an objective response endpoint—inadequate (3,4). That is, the new agent may not be expected to increase the objective response rate associated with the standard chemotherapy but may, nevertheless, increase progression-free or overall survival. The authors suggest that an early small difference in average change in tumor size between the treatment arms with and without the experimental agent may be predictive of a clinically meaningful difference in progression-free or overall survival. Furthermore, they assert, and we agree, that such a difference in change in tumor size cannot be assessed adequately using historical control subjects. Finally, the authors correctly point out the statistical efficiency of using a continuous endpoint as opposed to the dichotomous tumor response endpoint, noting that tumor size changes that are clinically unimportant at the individual level can, with moderate sample sizes, yield statistically significant differences between treatment groups.
As do Karrison et al. (2), we see potential problems with their proposed approach, and some additional factors relevant to the eventual utility of the design should be considered. First, small differences in early tumor size changes (modestly increased shrinkage or decreased expansion) offer no clinical benefit of their own and, therefore, must be shown to be predictive of the clinically meaningful outcomes of progression-free and overall survival before such differences can be considered a useful endpoint. The trials that the authors evaluated to design the trial they propose had a range of results for early tumor size change and clinical benefit. The variability in the results may reflect the extrapolations that were required for the authors to estimate mean change in tumor size in these studies; they may also reflect variability in the mechanisms of action of the agents studied in the trials, in the proportions of patients with sensitive versus insensitive disease, and in the overall level of efficacy of the agents in patients. These results do not directly address how early changes in tumor size may be predictive of clinical benefit because clinical benefit to the patient or study population likely relates to the type and magnitude of the antitumor effect as well as to its duration. Defining a potentially clinically meaningful difference in the early change in tumor size would require analysis of many more studies over many more disease subgroups and different classes of agents because the endpoint may prove to be predictive only in certain settings. Even then, this endpoint would be most useful in situations in which median progression-free survival is relatively long (at least 6–12 months); otherwise, the progression-free survival endpoint in a randomized trial would be both feasible and preferable as an endpoint, as described below. In any event, when early tumor size changes are small, it may be useful to incorporate some measure of their durability. Adapting a suggestion by the authors, we suggest that it may be useful to average tumor size changes over some period of time.
Second, substantial care is needed to avoid bias in assessing tumor size because even a small bias could result in a statistically significant, but spurious, apparent treatment-related difference. Avoiding this bias might require blinded assessments or external review of assessments.
Third, the approach does not adequately accommodate detection of new disease. It is not sufficient to add the size of a new lesion to the sum total of the size of existing lesions because the implications of a new lesion are generally worse than those of a comparable increase in existing lesions. Indeed, the Response Evaluation Criteria in Solid Tumors guidelines (5) treat any new disease as unequivocal evidence of progression. It might be possible to accommodate new disease in the same way that the authors propose to accommodate early death, that is, by making the comparison nonparametric and treating new disease, like death, as a worse outcome than any observed tumor size increase.
Finally, our most serious concern is that use of this endpoint threatens to "lower the bar" for phase 2 trials to be called positive. It is not clear what difference in mean tumor size in a patient population on a phase 2 trial will predict a successful phase 3 trial result and thus true patient benefit. However, the difference in tumor size change between treatments that is sufficient to achieve statistical significance will be a function of the sample size and, therefore, may be quite small, even for moderately sized trials. For example, the authors propose a three-arm trial with 50 patients per arm, in which the difference at 8 weeks between an overall "expected" increase in tumor size of 5% for the control arm and an overall expected decrease of 12% for the high-dose arm will be sufficient to result in statistical significance (at the one-sided .10 level) with roughly 80% power for the comparison between these two arms. This power calculation implies that an "observed" 5% increase for the control arm and an observed 5% decrease for the high-dose arm should be sufficient to achieve statistical significance. Without further evidence of the agent's efficacy, it would be risky to judge such an outcome sufficient to mount a phase 3 study because applying this approach to judge success in phase 2 trials may substantially increase the percentage of negative phase 3 studies.
Karrison et al. (2) present a false dichotomy in describing an either/or choice between single-arm phase 2 studies and randomized phase 2 studies using average change in tumor size as an endpoint. In fact, it is often feasible to mount a randomized phase 2 trial using the standard endpoint of progression-free survival. For example, we recently described (4) an approach to nondefinitive randomized comparisons of experimental regimens with standard treatments by carefully adjusting the false-positive error rates (
or type I error) and false-negative error rates (
or type II error), and in that report we reviewed previous approaches proposed for randomized phase 2 designs. The central example given by Karrison et al. (2) exemplifies the feasibility of randomized phase 2 studies using a progression-free survival endpoint. Shepherd et al. (6) reported a median progression-free survival of 2.2 months using erlotinib alone in patients previously treated for non–small-cell lung cancer. A 150-patient three-arm randomized trial in this population comparing erlotinib alone with erlotinib plus each of two doses of sorafenib, as proposed by Karrison et al. (2), would be feasible using a progression-free survival endpoint (4). Such a trial would have 90% power to detect a 75% increase in median progression-free survival (3.85 versus 2.2 months) for either sorafenib arm relative to that in the erlotinib-alone arm (at the one-sided .10 significance level, so long as at least 84 progressions were observed for the two arms compared). A more economic design would exclude the lower-dose sorafenib arm and would have the same power to compare the two other arms, with only 100 patients. In situations in which the progression-free survival for the control group is greater than 2.2 months, it may be appropriate to target a smaller percentage increase in median progression-free survival for the experimental arm, which would increase the number of patients required. However, the extent of increase required could be minimized by modest relaxation of the type I and II error rates. For example, relaxing the type II error rate to .20 would make it possible to detect a 50% increase in median progression-free survival, so long as the number of progressions observed for the two arms was increased to 110.
Furthermore, it is often feasible to mount a randomized phase 2 trial using both progression-free survival and objective tumor response as primary endpoints. Recently, we have proposed such designs to investigators mounting trials under National Cancer Institute sponsorship, understanding that an increase in either progression-free survival or tumor response associated with a new agent would be encouraging and wishing to have a prospective design that accommodates this. Such a design would also be possible for the central example in Karrison et al. (2). Shepherd et al. (6) reported an objective response rate of 9%. A 100-patient two-arm trial would have 90% power to detect a doubling in median progression-free survival (4.4 versus 2.2 months) associated with the addition of sorafenib to erlotinib (at the one-sided .025 significance level, so long as at least 84 progressions were observed for the two arms in total) while simultaneously having 80% power to detect a difference in tumor response of 30% versus 10% (at the one-sided .075 significance level). (Note that type I error is split judiciously between the two comparisons so that the total does not exceed .10. In fact, because the two endpoints are almost surely positively correlated, total type I error will be less than .10.)
In conclusion, we feel that the endpoint proposed by Karrison et al. (2) deserves further study. If validated and used properly, this endpoint could provide an efficient early indication of potential benefit in situations in which neither objective tumor response nor progression-free survival is a feasible endpoint, and its use could facilitate randomized comparisons for subpopulations that are difficult to accrue. However, caution is required because the approach risks conflating statistical significance with clinical significance/benefit, thereby inappropriately lowering the threshold for claiming success in phase 2 clinical trials and increasing the number of negative phase 3 studies.
REFERENCES
(1) Benjamin RS, Choi H, Macapinlac HA, Burgess MA, Patel SR, Chen LL, et al. We should desist using RECIST, at least in GIST. J Clin Oncol (2007) 25:1760–4.
(2) Karrison TG, Maitland ML, Stadler WM, Ratain MJ. Design of phase II cancer trials using a continuous endpoint of change in tumor size: application to a study of sorafenib and erlotinib in non–small-cell lung cancer. J Natl Cancer Inst (2007).
(3) Korn EL, Arbuck SG, Pluda JM, Simon R, Kaplan RS, Christian MC. Clinical trial designs for cytostatic agents: are new approaches needed? J Clin Oncol (2001) 19:265–72.
(4) Rubinstein LV, Korn EL, Freidlin B, Hunsberger SA, Ivy SP, Smith MA. Randomized phase 2 design issues and a proposal for phase 2 screening trials. J Clin Oncol (2005) 23:7199–206.
(5) Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, et al. New guidelines to evaluate the response to treatment in solid tumors. J Natl Cancer Inst (2000) 92:205–16.
(6) Shepherd FA, Rodrigues Pereira J, Ciuleanu T, Tan EH, Hirsh V, Thongprasert S, et al. Erlotinib in previously treated non-small-cell lung cancer. N Engl J Med (2005) 353:123–32.
Related Articles in JNCI
![]()
CiteULike
Connotea
Del.icio.us What's this?
J Natl Cancer Inst 2007 99: 1455-1461.
J Natl Cancer Inst 2007 99: 1421.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||