© The Author 2007. Published by Oxford University Press.
COMMENTARY |
When You Look Matters: The Effect of Assessment Schedule on Progression-Free Survival
Affiliations of authors: Departments of Epidemiology and Biostatistics (KSP, LBP, DS) and Medicine (MND, PBC, DS), Memorial Sloan-Kettering Cancer Center, New York, NY
Correspondence to: Katherine S. Panageas, DrPH, Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 307 East 63rd St, 3rd Fl, New York, NY 10021 (e-mail: panageak{at}mskcc.org).
| ABSTRACT |
|---|
|
|
|---|
Progression-free survival (PFS) is increasingly used as an endpoint for cancer clinical trials. Disease progression is typically assessed on the basis of radiologic testing at scheduled time points or after a fixed number of treatment cycles. The date of the radiologic evaluation at which progression is first evident is used as a proxy for the true progression time. The true progression time actually lies somewhere within the time interval between two assessments, a situation that results in interval-censored data. An analysis that ignores this interval censoring and uses the detection date as the date of progression unavoidably results in an overestimation of median PFS. This overestimation can erroneously result in a result being described as clinically significant when in fact a longer median PFS may just be a consequence of the length of the surveillance interval. Furthermore, if surveillance intervals are heterogenous within a disease group, comparisons of median PFS across studies may not be meaningful. The decision to use PFS as a primary endpoint should be made carefully when designing clinical trials, and investigators focused on a particular disease should develop consensus standards and strive for consistent surveillance intervals.
Progression-free survival (PFS) is increasingly used as an endpoint for phase II and III clinical trials in oncology (1). It is inevitable that this endpoint will be used more frequently as the availability of targeted therapies increases and overall survival is improved. PFS is often seen as a desirable outcome measure because it is available earlier than overall survival and can thereby shorten drug development time, resulting in more rapid availability of efficacious therapies. Furthermore, unlike overall survival, PFS is not influenced by second-line treatment choices. But in contrast to overall survival, disease progression is subject to measurement error; it is also influenced by the timing of scheduled reassessments. Concerns about measurement bias and variation in radiologists' interpretation of imaging studies have lead to initiatives such as Response Evaluation Criteria in Solid Tumors (RECIST) guidelines to facilitate consistency in tumor measurement and interpretation (2). However, the RECIST criteria do not address the timing of reassessment for progression, which is an additional source of bias that may complicate interpretation of PFS in clinical trials. Here we consider how variation in surveillance intervals may affect the estimation of PFS and propose strategies that clinical trialists and statisticians might use to mitigate the shortcomings of this outcome in study design and reporting.
PFS is defined as the time from the start of treatment to the date of disease progression or death. When progression is evident from clinical symptoms, dates of progression typically coincide with appointment dates, and progression is thus a function of the frequency with which clinician assessments are scheduled. Increasingly, however, cancer clinical trials focus on therapy for disease that is not detectable on the basis of physical examination or symptoms, and radiographic scans have become the primary criteria for assessing progression. Treatment protocols require radiographic scans at prespecified intervals or after a fixed number of treatment cycles to assess tumor lesions. Unless assessments are made daily, the exact progression date is not known; hence, the true progression time lies somewhere within the time interval between two assessments. For example, as illustrated in Fig. 1, if a patient is assessed with radiographic scans every 8 weeks, the progression will be detected at multiples of 8 weeks (except among patients who become symptomatic, prompting unscheduled evaluation). In this example, the true progression time is at 18 weeks, but the investigator can tell only that it lies somewhere within the interval from 16 to 24 weeks. Standard survival analysis methods (such as the KaplanMeier method) assume that the exact event time is known, but whereas a calendar date can be specified for survival outcomes, the exact time of a progression event is not known. In statistical terms, these data are described as interval censored.
|
To see how often PFS is reported as a trial endpoint and to determine if there was consistency across trials for a specific cancer with regard to tumor assessment schedule, we performed a literature review of phase II and III clinical trials in breast cancer that were published in 2005. We identified studies by using the following algorithm to search titles and abstracts in the EmBase and Medline databases: breast cancer AND (phase II OR phase III) AND (time to progression OR progression-free survival). The search yielded 67 reports, of which 24 were excluded because they were not original reports of breast cancer trials. Of the 43 remaining studies (list is available online as Supplemental Data), PFS or time to progression was listed as the primary endpoint by 10 studies (23%) and as the secondary endpoint by 19 studies (44%); 14 studies (33%) did not specify PFS as a stated trial objective. The assessment schedule was generally defined as a multiple of the treatment cycle, usually every two cycles but in some cases every three cycles. However, treatment cycle lengths varied; cycles were repeated every 6 weeks in 20 studies, every 8 weeks in six studies, every 9 weeks in seven studies, and at intervals ranging from 10 weeks to 3 months in six studies. Four studies did not specify assessment intervals.
Although these differences in cycle length may seem modest, there is a close relationship between cycle length and progression date that can lead to bias. To see how these biases might originate, consider results from a recent phase II clinical trial evaluating PFS. Thirty-seven women with metastatic breast cancer were treated with erlotinib and bevacizumab (BV) (3). The protocol specified that a computed tomography scan be performed every 9 weeks to determine disease progression. In accordance with common practice, the date of progression was defined as the scan date at which progression was first evident. In statistical parlance, this definition means that interval censoring was ignored and that the "upper limit" of the true progression interval was used to determine PFS. Figure 2 displays PFS as determined by KaplanMeier analysis using this common approach. It is clear from the graph that the progression events mostly occurred at the scheduled assessment times, i.e., at a multiple of 9 weeks (areas marked with a circle). The few progression events detected at time points other than the 9-week intervals were due to patients experiencing symptoms that triggered earlier reassessments. Using the upper limit of the progression interval as the progression date, the estimated median PFS was 10 weeks (95% confidence interval [CI] = 8 to 17 weeks; Fig. 2).
|
However, another way to analyze progression would be to use the other end of the progression interval so that progression would be defined as occurring on the date of the scan before the one at which progression was identified, namely the "lower limit" of the progression interval. Using the lower limit, the estimated median PFS from KaplanMeier analysis is 0 weeks (95% CI = 0 to 9 weeks; Fig. 3). This estimate of median PFS is markedly different from the 10-week estimate of PFS obtained by the use of the upper limit and is likely to be an underestimate of the true median PFS. A possible alternative approach would be to use the midpoint of the interval between scan dates as the progression date. This approach would result in an estimated median PFS of 5 weeks (95% CI = 4 to 13 weeks; Fig. 3). There is no way to evaluate which of the three median PFS estimates given by these three approaches most accurately represents the true underlying median. In effect, we are ignoring the interval censoring by arbitrarily basing the progression date on the interval reassessment date.
|
To better understand the effect of interval censoring on KaplanMeier estimates of median PFS and to illustrate the extent to which variation in surveillance intervals might bias study results, we conducted a simulation study (Table 1). Progression times were simulated from an exponential distribution with uniform censoring. One thousand samples of 25 patients (a typical size for a phase II trial in oncology) were generated for different prespecified true median PFS times (3, 6, and 12 months), and a series of assessment schedules was imposed on the simulated progression times. We then calculated the median of the estimated medians and the percent bias of the estimated median (defined as [estimated median PFS minus true median PFS]/[true median PFS] x 100%).
|
Comparing each specified true median PFS with the KaplanMeier estimates generated using the upper limit (standard approach), the interval midpoint, and the lower limit again illustrates that the upper limit method consistently overestimates the true median PFS and the lower limit consistently underestimates it. For example, in the sixth row of Table 1 (true median PFS of 3 months and tumor assessments made every 9 weeks), using the upper limit method results in an estimated median PFS that is 1.2 months greater than the true median (37% bias), and using the lower limit results in an underestimation of almost 1 month (31% bias). In this situation, the midpoint was relatively unbiased (3% bias). However, scanning down the midpoint column shows that this is not always the case. Although reliance on the midpoint is an intuitively attractive compromise, this simulation reveals that it can also result in high biases (e.g., an absolute bias as high as 54% when tumor assessments are made every 12 weeks).
Furthermore, we observed that the estimated median PFS was a multiple of the assessment interval. This situation results because a progression will generally be detected at the time points specified for radiologic assessment unless members of a study cohort become symptomatic, prompting early reevaluation of progression. For example, if the actual median PFS is 3 months and radiologic assessment is done every 4 weeks (Table 1), the use of the upper limit yields an estimated median PFS of 3.7 months. This time corresponds to precisely 16 weeks, four times the assessment interval of 4 weeks. In this block, the bias is lowest when the assessment interval is 7 weeks, for which the use of the upper limit yields an estimated median PFS of 3.2 months, or 14 weeks (twice the assessment interval), which happens to be very close to the true median PFS. On the other hand, the bias is greatest when the assessment is every 11 weeks, which results in an estimated median PFS of 5.1 months, or 22 weeks. The bias does not necessarily increase with an increase in the length of the assessment interval; instead, it depends on the timing of the interval relative to the true median. The magnitude of bias decreased slightly with an increase in the true median PFS (i.e., from 3 months to 6 months and 12 months) for the assessment intervals examined (Table 1).
These observations are not artifacts of the simulation study and will hold true in practice if assessments are obtained at the scheduled assessment times. In a recent report by Kabbinavar et al. (4) of a randomized phase II trial of the addition of BV to fluorouracil (FU) and leucovorin (LV) in patients with metastatic colorectal cancer, the median PFS for the FULVBV arm was 9.2 months and that for the FULVplacebo arm was 5.5 months. The assessment schedule was reported as every 8 weeks. Consistent with our results, the median PFS times were multiples of the assessment schedule; 9.2 months is 40 weeks, which corresponds to the fifth assessment time, and 5.5 months is 24 weeks, which corresponds to the third assessment time. The true median PFS in the FULVBV arm must lie between the fourth and fifth assessment times, i.e., between 7.4 and 9.2 months. Similarly, the true median PFS for the FULVplacebo arm must lie between the second and third assessment times, i.e., between 3.7 and 5.5 months. Yet, current practice as reflected in the clinical trial literature is to report the estimate corresponding to the last assessment (e.g., 5.5 and 9.2 months in this example). We suggest that interval reporting would facilitate reliable interpretation of clinical trial results. In addition, comparisons across studies should consider the assessment intervals in relation to the risk of progression.
In an ideal world, neither clinicians nor statisticians would make head to head comparisons of treatment efficacy on the basis of the results of phase II clinical trials. However, because resources are scarce and not all possible phase III studies can be conducted, phase II results are inevitably scrutinized to identify clinically meaningful differences and to debate which treatment strategies should be advanced to phase III trials. For this reason, harmonization of the approaches for estimating PFS may provide methodologic rigor and consistency to trial design and interpretation. It is essential to note that overestimation of median PFS can erroneously result in declaring one phase II study superior to another, especially given the strong influence of surveillance interval on median PFS. In addition, heterogeneity in the timing of assessments within a disease group can pose a problem when median PFS is compared across studies and when historic controls are used to inform the design of future studies. In practice, assessment intervals are usually dictated by the length of a cycle of therapy (with patients usually being evaluated after a fixed number of cycles), but it would not be insurmountable to impose consistent standards for particular diseases.
We intend that the results of the simple simulation exercises we have presented be used to encourage clinical trialists and statisticians to consider some simple options when designing and analyzing studies with PFS as an outcome. First, investigators should consider reporting the lower limit, upper limit, and midpoint of PFS. Second, standard radiologic surveillance intervals should be considered for particular disease groups: e.g., every 8 weeks for chemotherapy trials in metastatic pancreatic or metastatic breast cancer, every 10 weeks for metastatic colon cancer trials, and every 12 weeks for metastatic breast cancer hormonal therapy trials.
To accommodate the realities of the clinical setting, there needs to be a reasonable margin for scheduling around these specified time points. When designing a phase II study for a disease with short median PFS, it may be better to use a binary endpoint, such as the 3-month PFS rate, and to use a traditional phase II design, as is common for tumor response endpoints. All patients would then be assessed by this fixed time, allowing for greater uniformity of results across trials.
In addition to recognizing the effect of assessment schedule on median PFS and designing trials to limit the influence of interval-censored data, investigators should consider analyzing these data using methods that are well established in the statistical literature (59). One method to properly analyze interval-censored data is with a nonparametric extension of the KaplanMeier estimator. Figure 4 shows this approach for the breast cancer example (6,7). This result was obtained using the Interval Censored Estimation macro in SAS statistical software (10). Although the macro is straightforward, interpretation of the resulting graph is challenging and unfamiliar to most readers of the medical literature. Each line in Fig. 4 represents an interval of time at a given estimated survival probability. For example, from 9 weeks to 16.9 weeks after the start of treatment, the estimated PFS is 24% (95% CI = 1% to 48%). Where there is no line at a given estimated PFS, the corresponding time is a range defined by the interval bracketing the blank region in which that PFS occurs. For example, at 40 weeks, the PFS probability can be estimated within a range of 3.4% (95% CI = 0% to 10%) to 6.8% (95% CI = 0% to 16%). Similarly, the time at which PFS is 70% occurs from 0 to 3.3 months. In this example, the median PFS occurs in the interval from 8.9 to 9 months. Although this interval is quite small, in other datasets, a point estimate of median PFS may not fall in a region in which PFS can be estimated. The fact that the interpretation of this curve is not straightforward has precluded widespread adoption of this method.
|
An alternative method for analyzing interval-censored data is based on an accelerated failure-time model that assumes that event times are from a specified distribution (8). We fit this model to the breast cancer data by specifying a Weibull distribution for the progression times. This model provides predicted probabilities for each progression time. The resulting curve of these predicted probabilities is shown as the solid black curve in Fig. 5. The predicted median PFS for this patient group using the Weibull model is 7 weeks (95% CI = 3 to 11 weeks). The curve lies between the two curves corresponding to the upper and lower limits from a standard KaplanMeier analysis. However, this method has not yet become standard in clinical reports, perhaps because of the model dependence of the method, which requires specification of a distribution.
|
The issues regarding estimation of median PFS that we have outlined here are applicable to phase III trials and adjuvant studies as well as to phase II trials. In a simulation study that mimicked a single-arm adjuvant trial (including n = 100 or n = 250 patients) with a specified true median PFS time of close to 3 years (3.17, 3.25, or 3.40 years) and assessments every 3, 6, 9, and 12 months, bias in estimation of median PFS ranged from 0% to 26% (data not shown). However, in a two-arm study, comparison of PFS across treatment arms would be valid unless assessment intervals differed across arms or unless only one treatment arm resulted in a consistent delay in assessment schedule due to additional toxicity.
The decision to use PFS as a primary endpoint should be considered carefully in the design phase of a trial. It is our intention to make researchers aware that estimates of PFS are highly dependent on when they look for progression. The clinical research community can address this concern by adopting consistent strategies for interval evaluations in the design phase. The biostatistics community can mitigate this source of bias by increasing the use of methods to analyze interval-censored data.
Until these methods enter the mainstream and become widely available in statistical software packages, we recommend analyzing data using both the lower (the assessment before the detection occurred) and upper endpoints of the assessment intervals. This approach will mimic the extreme scenarios and will bracket the true distribution. In addition, these results could be compared with the parametric results that account for interval censoring. However, in order to conduct such these analyses, it is necessary to record not only the date of the last assessment (progression or last follow-up), but also the date of the previous assessment. This may be a change in how the data are recorded and maintained and would need to be accounted for at the beginning of data collection.
Reliance on the final surveillance date to determine progression will necessarily result in inflated estimates of median PFS. This overestimation does not necessarily increase with an increase in length of the assessment interval, but rather, the increase depends on the timing of the interval relative to the true median. As PFS becomes an increasingly important endpoint for evaluation of new treatments, these caveats regarding its interpretation merit greater awareness.
| NOTES |
|---|
|
|
|---|
The phase II clinical trial used as an example in this article was supported in part by the National Cancer Institute, Genentech, and OSI Pharmaceuticals. None of these sponsors played any role in the development of the concept, design, or writing of the manuscript.
| REFERENCES |
|---|
|
|
|---|
(1) Johnson JR, Williams G, Pazdur R. (2003) End points and United States Food and Drug Administration approval of oncology drugs. J Clin Oncol 21:140411.
(2) Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, et al. (2000) New guidelines to evaluate the response to treatment in solid tumors. J Natl Cancer Inst 92:20516.
(3) Dickler M, Rugo H, Caravelli J, Brogi E, Sachs D, Panageas K, et al. (2004) Phase II trial of erlotinib (OSI-774), an epidermal growth factor receptor (EGFR)-tyrosine kinase inhibitor, and bevacizumab, a recombinant humanized monoclonal antibody to vascular endothelial growth factor (VEGF), in patients (pts) with metastatic breast cancer (MBC). J Clin Oncol 22:127S.
(4) Kabbinavar FF, Schulz J, McCleod M, Patel T, Hamm JT, Hecht JR, et al. (2005) Addition of bevacizumab to bolus fluorouracil and leucovorin in first-line metastatic colorectal cancer: results of a randomized phase II trial. J Clin Oncol 23:3697705.
(5) Lindsey JC and Ryan LM. (1998) Tutorial in biostatistics methods for interval-censored data. Stat Med 17:21938.[CrossRef][ISI][Medline]
(6) Peto R. (1973) Experimental survival curves for interval-censored data. Appl Stat 22:8691.
(7) Turnbull BW. (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc Ser B 38:2905.
(8) Odell PM, Anderson KM, D'Agostino RB. (1992) Maximum likelihood estimation for interval-censored data using a Weibull-based accelerated failure time model. Biometrics 48:9519.[CrossRef][ISI][Medline]
(9) Groeneboom P and Wellner JA. (1992) Information bounds and nonparametric maximum likelihood estimation(Birkhauser, New York (NY)).
(10) SAS Institute Inc. ICE, macro to compute nonparametric survival curves for interval censored data. Jul 13, 1993 [cited 2005 January 27]. Available at: http://ftp.sas.com/techsup/download/stat/ice.html. [Last accessed: January 27, 2005.].
Manuscript received September 25, 2006; revised January 12, 2007; accepted January 29, 2007.
Correspondence about this Article
Related Articles in JNCI
![]()
CiteULike
Connotea
Del.icio.us What's this?
J Natl Cancer Inst 2007 99: 1131-1132.
J Natl Cancer Inst 2008 100: 373.
J Natl Cancer Inst 2007 99: 413.
J Natl Cancer Inst 2007 99: 1068-1069.
This article has been cited by other articles:
![]() |
J. F. San Miguel, R. Schlag, N. K. Khuageva, M. A. Dimopoulos, O. Shpilberg, M. Kropff, I. Spicka, M. T. Petrucci, A. Palumbo, O. S. Samoilova, et al. Bortezomib plus Melphalan and Prednisone for Initial Treatment of Multiple Myeloma N. Engl. J. Med., August 28, 2008; 359(9): 906 - 917. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mathoulin-Pelissier, S. Gourgou-Bourgade, F. Bonnetain, and A. Kramar Survival End Point Reporting in Randomized Cancer Clinical Trials: A Review of Major Journals J. Clin. Oncol., August 1, 2008; 26(22): 3721 - 3726. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. V. Peter, P. John, P. L Graham, J. L Moran, I. A. George, and A. Bersten Corticosteroids in the prevention and treatment of acute respiratory distress syndrome (ARDS) in adults: meta-analysis BMJ, May 3, 2008; 336(7651): 1006 - 1009. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. G. Richardson, R. Niesvizky, K. C. Anderson, and J. Blade Re: When You Look Matters: The Effect of Assessment Schedule on Progression-Free Survival J Natl Cancer Inst, March 5, 2008; 100(5): 373 - 373. [Full Text] [PDF] |
||||
![]() |
E. L. Korn, P.-Y. Liu, S. J. Lee, J.-A. W. Chapman, D. Niedzwiecki, V. J. Suman, J. Moon, V. K. Sondak, M. B. Atkins, E. A. Eisenhauer, et al. Meta-Analysis of Phase II Cooperative Group Trials in Metastatic Stage IV Melanoma to Determine Progression-Free and Overall Survival Benchmarks for Future Phase II Trials J. Clin. Oncol., February 1, 2008; 26(4): 527 - 534. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Andrieu, D. E. Goldgar, D. F. Easton, M. Rookus, R. Brohet, A. C. Antoniou, and J. Chang-Claude Response: Re: Pregnancies, Breastfeeding, and Breast Cancer Risk in the International BRCA1/2 Carrier Cohort Study (IBCCS) J Natl Cancer Inst, July 18, 2007; 99(14): 1131 - 1131. [Full Text] [PDF] |
||||
![]() |
R. Kane Re: When You Look Matters: The Effect of Assessment Schedule on Progression-Free Survival J Natl Cancer Inst, July 18, 2007; 99(14): 1131 - 1132. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








