© 2000 by Oxford University Press
Journal of the National Cancer Institute, Vol. 92, No. 3, 205-216,
February 2, 2000
© 2000 Oxford University Press
SPECIAL ARTICLE |
New Guidelines to Evaluate the Response to Treatment in Solid Tumors
Affiliations of authors: P. Therasse, J. Verweij, M. Van Glabbeke, A. T. van Oosterom, European Organization for Research and Treatment of Cancer, Brussels, Belgium; S. G. Arbuck, R. S. Kaplan, L. Rubinstein, M. C. Christian, National Cancer Institute, Bethesda, MD; E. A. Eisenhauer, National Cancer Institute of Canada Clinical Trials Group, Kingston, ON, Canada; J. Wanders, New Drug Development Office Oncology, Amsterdam, The Netherlands; S. G. Gwyther, East Surrey Healthcare National Health Service Trust, Redhill, U.K.
Correspondence to: Patrick Therasse, M.D., European Organization for Research and Treatment of Cancer Data Center, Avenue Mounier 83/11, 1200 Brussels, Belgium (e-mail: pth{at}eortc.be).
| ABSTRACT |
|---|
|
|
|---|
Anticancer cytotoxic agents go through a process by which their antitumor activityon the basis of the amount of tumor shrinkage they could generatehas been investigated. In the late 1970s, the International Union Against Cancer and the World Health Organization introduced specific criteria for the codification of tumor response evaluation. In 1994, several organizations involved in clinical research combined forces to tackle the review of these criteria on the basis of the experience and knowledge acquired since then. After several years of intensive discussions, a new set of guidelines is ready that will supersede the former criteria. In parallel to this initiative, one of the participating groups developed a model by which response rates could be derived from unidimensional measurement of tumor lesions instead of the usual bidimensional approach. This new concept has been largely validated by the Response Evaluation Criteria in Solid Tumors Group and integrated into the present guidelines. This special article also provides some philosophic background to clarify the various purposes of response evaluation. It proposes a model by which a combined assessment of all existing lesions, characterized by target lesions (to be measured) and nontarget lesions, is used to extrapolate an overall response to treatment. Methods of assessing tumor lesions are better codified, briefly within the guidelines and in more detail in Appendix I. All other aspects of response evaluation have been discussed, reviewed, and amended whenever appropriate.
| A. PREAMBLE |
|---|
|
|
|---|
Early attempts to define the objective response of a tumor to an anticancer agent were made in the early 1960s (1,2). In the mid- to late 1970s, the definitions of objective tumor response were widely disseminated and adopted when it became apparent that a common language would be necessary to report the results of cancer treatment in a consistent manner.
The World Health Organization (WHO) definitions published in the 1979 WHO Handbook (3) and by Miller et al. (4) in 1981 have been the criteria most commonly used by investigators around the globe. However, some problems have developed with the use of WHO criteria: 1) The methods for integrating into response assessments the change in size of measurable and "evaluable" lesions as defined by WHO vary among research groups, 2) the minimum lesion size and number of lesions to be recorded also vary, 3) the definitions of progressive disease are related to change in a single lesion by some and to a change in the overall tumor load (sum of the measurements of all lesions) by others, and 4) the arrival of new technologies (computed tomography [CT] and magnetic resonance imaging [MRI]) has led to some confusion about how to integrate three-dimensional measures into response assessment.
These issues and others have led to a number of different modifications or clarifications to the WHO criteria, resulting in a situation where response criteria are no longer comparable among research organizationsthe very circumstance that the WHO publication had set out to avoid. This situation led to an initiative undertaken by representatives of several research groups to review the response definitions in use and to create a revision of the WHO criteria that, as far as possible, addressed areas of conflict and inconsistency.
In so doing, a number of principles were identified:
- 1) Despite the fact that "novel" therapies are being developed that may
work by
mechanisms unlikely to cause tumor regression, there remains an important need to continue to
describe
objective change in tumor size in solid tumors for the foreseeable future. Thus, the four categories
of
complete response, partial response, stable disease, and progressive disease, as originally
categorized
in the WHO Handbook (3), should be retained in any new
revision.
2) Because of the need to retain some ability to compare favorable results of future therapies with those currently available, it was agreed that no major discrepancy in the meaning and the concept of partial response should exist between the old and the new guidelines, although measurement criteria would be different.
3) In some institutions, the technology now exists to determine changes in tumor volume or changes in tumor metabolism that may herald shrinkage. However, these techniques are not yet widely available, and many have not been validated. Furthermore, it was recognized that the utility of response criteria to date had not been related to precision of measurement. The definition of a partial response, in particular, is an arbitrary conventionthere is no inherent meaning for an individual patient of a 50% decrease in overall tumor load. It was not thought that increased precision of measurement of tumor volume was an important goal for its own sake. Rather, standardization and simplification of methodology were desirable. Nevertheless, the guidelines proposed in this document are not meant to discourage the development of new tools that may provide more reliable surrogate end points than objective tumor response for predicting a potential therapeutic benefit for cancer patients.
4) Concerns regarding the ease with which a patient may be considered mistakenly to have disease progression by the current WHO criteria (primarily because of measurement error) have already led some groups such as the Southwest Oncology Group to adopt criteria that require a greater increase in size of the tumor to consider a patient to have progressive disease (5). These concerns have led to a similar change within these revised WHO criteria (see Appendix II).
5) These criteria have not addressed several other areas of recent concern, but it is anticipated that this process will continue and the following will be considered in the future:
- Measures of antitumor activity, other than tumor shrinkage, that
may appropriately allow investigation of cytostatic agents in phase II
trials;
Definitions of serum marker response and recommended methodology for their validation; and
Specific tumors or anatomic sites presenting unique complexities.
| B. BACKGROUND |
|---|
|
|
|---|
These guidelines are the result of a large, international collaboration. In 1994, the European Organization for Research and Treatment of Cancer (EORTC), the National Cancer Institute (NCI) of the United States, and the National Cancer Institute of Canada Clinical Trials Group set up a task force (see Appendix III) with the main objective of reviewing the existing sets of criteria used to evaluate response to treatment in solid tumors. After 3 years of regular meetings and exchange of ideas within the task force, a draft revised version of the WHO criteria was produced and widely circulated (see Appendix IV). Comments received (response rate, 95%) were compiled and discussed within the task force before a second version of the document integrating relevant comments was issued. This second version of the document was again circulated to external reviewers who were also invited to participate in a consensus meeting (on behalf of the organization that they represented) to discuss and finalize unresolved problems (October 1998). The list of participants to this consensus meeting is shown in Appendix IV and included representatives from academia, industry, and regulatory authorities. Following the recommendations discussed during the consensus meeting, a third version of the document was produced, presented publicly to the scientific community (American Society for Clinical Oncology, 1999), and submitted to the Journal of the National Cancer Institute in June 1999 for official publication.
Data from collaborative studies, including more than 4000 patients assessed for tumor response, support the simplification of response evaluation through the use of unidimensional measurements and the sum of the longest diameters instead of the conventional method using two measurements and the sum of the products. The results of the different retrospective analyses (comparing both approaches) performed by use of these different databases are described in Appendix V. This new approach, which has been implemented in the following guidelines, is based on the model proposed by James et al. (6).
| C. RESPONSE EVALUATION CRITERIA IN SOLID TUMORS (RECIST) GUIDELINES |
|---|
|
|
|---|
1. Introduction
The introduction explores the definitions, assumptions, and purposes of tumor response criteria. Below, guidelines that are offered may lead to more uniform reporting of outcomes of clinical trials. Note that, although single investigational agents are discussed, the principles are the same for drug combinations, noninvestigational agents, or approaches that do not involve drugs.
Tumor response associated with the administration of anticancer agents can be evaluated for at least three important purposes that are conceptually distinct:
- Tumor response as a prospective end point in early clinical trials.
In this situation, objective tumor response is employed to determine
whether the agent/regimen demonstrates sufficiently encouraging results
to warrant further testing. These trials are typically phase II trials
of investigational agents/regimens (see section 1.2), and it
is for use in this precise context that these guidelines have been
developed.
- Tumor response as a prospective end point in more definitive
clinical trials designed to provide an estimate of benefit for a
specific cohort of patients. These trials are often randomized
comparative trials or single-arm comparisons of combinations of agents
with historical control subjects. In this setting, objective tumor
response is used as a surrogate end point for other measures of
clinical benefit, including time to event (death or disease
progression) and symptom control (see section 1.3).
- Tumor response as a guide for the clinician and patient or study
subject in decisions about continuation of current therapy. This
purpose is applicable both to clinical trials and to routine practice
(see section 1.1), but use in the context of decisions
regarding continuation of therapy is not the primary focus of this
document.
However, in day-to-day usage, the distinction among these uses of the term "tumor response" can easily be missed, unless an effort is made to be explicit. When these differences are ignored, inappropriate methodology may be used and incorrect conclusions may result.
1.1. Response Outcomes in Daily Clinical Practice of Oncology The evaluation of tumor response in the daily clinical practice of oncology may not be performed according to predefined criteria. It may, rather, be based on a subjective medical judgment that results from clinical and laboratory data that are used to assess the treatment benefit for the patient. The defined criteria developed further in this document are not necessarily applicable or complete in such a context. It might be appropriate to make a distinction between "clinical improvement" and "objective tumor response" in routine patient management outside the context of a clinical trial.
1.2. Response Outcomes in Uncontrolled Trials as a Guide to Further Testing of a New Therapy "Observed response rate" is often employed in single-arm studies as a "screen" for new anticancer agents that warrant further testing. Related outcomes, such as response duration or proportion of patients with complete responses, are sometimes employed in a similar fashion. The utilization of a response rate in this way is not encumbered by an implied assumption about the therapeutic benefit of such responses but rather implies some degree of biologic antitumor activity of the investigated agent.
For certain types of agents (i.e., cytotoxic drugs and hormones), experience has demonstrated that objective antitumor responses observed at a rate higher than would have been expected to occur spontaneously can be useful in selecting anticancer agents for further study. Some agents selected in this way have eventually proven to be clinically useful. Furthermore, criteria for "screening" new agents in this way can be modified by accumulated experience and eventually validated in terms of the efficiency by which agents so screened are shown to be of clinical value by later, more definitive, trials.
In most circumstances, however, a new agent achieving a response rate determined a priori to be sufficiently interesting to warrant further testing may not prove to be an effective treatment for the studied disease in subsequent randomized phase III trials. Random variables and selection biases, both known and unknown, can have an overwhelming effect in small, uncontrolled trials. These trials are an efficient and economic step for initial evaluation of the activity of a new agent or combination in a given disease setting. However, many such trials are performed, and the proportion that will provide false-positive results is necessarily substantial. In many circumstances, it would be appropriate to perform a second small confirmatory trial before initiating large resource-intensive phase III trials.
Sometimes, several new therapeutic approaches are studied in a randomized phase II trial. The purpose of randomization in this setting, as in phase III studies, is to minimize the impact of random imbalances in prognostic variables. However, randomized phase II studies are, by definition, not intended to provide an adequately powered comparison between arms (regimens). Rather, the goal is simply to identify one or more arms for further testing, and the sample size is chosen so to provide reasonable confidence that a truly inferior arm is not likely to be selected. Therefore, reporting the results of such randomized phase II trials should not imply statistical comparisons between treatment arms.
1.3. Response Outcomes in Clinical Trials as a Surrogate for Palliative Effect 1.3.1. Use in nonrandomized clinical trials. The only circumstance in which objective responses in a nonrandomized trial can permit a tentative assumption of a palliative effect (i.e., beyond a purely clinical measure of benefit) is when there is an actual or implied comparison with historical series of similar patients. This assumption is strongest when the prospectively determined statistical analysis plan provides for matching of relevant prognostic variables between case subjects and a defined series of control subjects. Otherwise, there must be, at the very least, prospectively determined statistical criteria that provide a very strong justification for assumptions about the response rate that would have been expected in the appropriate "control" population (untreated or treated with conventional therapy, as fits the clinical setting). However, even under these circumstances, a high rate of observed objective response does not constitute proof or confirmation of clinical therapeutic benefit. Because of unavoidable and nonquantifiable biases inherent in nonrandomized trials, proof of benefit still requires eventual confirmation in a prospectively randomized, controlled trial of adequate size. The appropriate end points of therapeutic benefit for such a trial are survival, progression-free survival, or symptom control (including quality of life).
1.3.2. Use in randomized trials. Even in the context of prospectively randomized phase III comparative trials, "observed response rate" should not be the sole, or major, end point. The trial should be large enough that differences in response rate can be validated by association with more definitive end points reflecting therapeutic benefit, such as survival, progression-free survival, reduction in symptoms, or improvement (or maintenance) of quality of life.
2. Measurability of Tumor Lesions at Baseline
2.1. Definitions
At baseline, tumor lesions will be categorized as follows:
measurable (lesions that can be accurately measured in at least one
dimension [longest diameter to be recorded] as
20 mm with
conventional techniques or as
10 mm with spiral CT scan
[see section 2.2]) or nonmeasurable (all other lesions,
including small lesions [longest diameter <20 mm with conventional
techniques or <10 mm with spiral CT scan] and truly nonmeasurable
lesions).
The term "evaluable" in reference to measurability is not recommended and will not be used because it does not provide additional meaning or accuracy.
All measurements should be recorded in metric notation by use of a ruler or calipers. All baseline evaluations should be performed as closely as possible to the beginning of treatment and never more than 4 weeks before the beginning of treatment.
Lesions considered to be truly nonmeasurable include the following: bone lesions, leptomeningeal disease, ascites, pleural/pericardial effusion, inflammatory breast disease, lymphangitis cutis/pulmonis, abdominal masses that are not confirmed and followed by imaging techniques, and cystic lesions.
(Note: Tumor lesions that are situated in a previously irradiated area might or might not be considered measurable, and the conditions under which such lesions should be considered must be defined in the protocol when appropriate.)
2.2. Specifications by Methods of Measurements The same method of assessment and the same technique should be used to characterize each identified and reported lesion at baseline and during follow-up. Imaging-based evaluation is preferred to evaluation by clinical examination when both methods have been used to assess the antitumor effect of a treatment.
2.2.1. Clinical examination. Clinically detected lesions will only be considered measurable when they are superficial (e.g., skin nodules and palpable lymph nodes). For the case of skin lesions, documentation by color photographyincluding a ruler to estimate the size of the lesionis recommended.
2.2.2. Chest x-ray. Lesions on chest x-ray are acceptable as measurable lesions when they are clearly defined and surrounded by aerated lung. However, CT is preferable. More details concerning the use of this method of assessment for objective tumor response evaluation are provided in Appendix I.
2.2.3. CT and MRI. CT and MRI are the best currently available and most reproducible methods for measuring target lesions selected for response assessment. Conventional CT and MRI should be performed with contiguous cuts of 10 mm or less in slice thickness. Spiral CT should be performed by use of a 5-mm contiguous reconstruction algorithm; this specification applies to the tumors of the chest, abdomen, and pelvis, while head and neck tumors and those of the extremities usually require specific protocols. More details concerning the use of these methods of assessment for objective tumor response evaluation are provided in Appendix I.
2.2.4. Ultrasound. When the primary end point of the study is objective response evaluation, ultrasound should not be used to measure tumor lesions that are clinically not easily accessible. It may be used as a possible alternative to clinical measurements for superficial palpable lymph nodes, subcutaneous lesions, and thyroid nodules. Ultrasound might also be useful to confirm the complete disappearance of superficial lesions usually assessed by clinical examination. Justifications for not using ultrasound to measure tumor lesions for objective response evaluation are provided in Appendix I.
2.2.5. Endoscopy and laparoscopy. The utilization of these techniques for objective tumor evaluation has not yet been fully or widely validated. Their uses in this specific context require sophisticated equipment and a high level of expertise that may be available only in some centers. Therefore, utilization of such techniques for objective tumor response should be restricted to validation purposes in specialized centers. However, such techniques can be useful in confirming complete histopathologic response when biopsy specimens are obtained.
2.2.6. Tumor markers. Tumor markers alone cannot be used to assess response. However, if markers are initially above the upper normal limit, they must return to normal levels for a patient to be considered in complete clinical response when all tumor lesions have disappeared. Specific additional criteria for standardized usage of prostate-specific antigen and CA (cancer antigen) 125 response in support of clinical trials are being validated.
2.2.7. Cytology and histology. Cytologic and histologic techniques can be used to differentiate between partial response and complete response in rare cases (e.g., after treatment to differentiate between residual benign lesions and residual malignant lesions in tumor types such as germ cell tumors). Cytologic confirmation of the neoplastic nature of any effusion that appears or worsens during treatment is required when the measurable tumor has met criteria for response or stable disease. Under such circumstances, the cytologic examination of the fluid collected will permit differentiation between response or stable disease (an effusion may be a side effect of the treatment) and progressive disease (if the neoplastic origin of the fluid is confirmed). New techniques to better establish objective tumor response will be integrated into these criteria when they are fully validated to be used in the context of tumor response evaluation.
3. Tumor Response Evaluation
3.1. Baseline Evaluation 3.1.1. Assessment of overall tumor burden and measurable disease. To assess objective response, it is necessary to estimate the overall tumor burden at baseline to which subsequent measurements will be compared. Only patients with measurable disease at baseline should be included in protocols where objective tumor response is the primary end point. Measurable disease is defined by the presence of at least one measurable lesion (as defined in section 2.1). If the measurable disease is restricted to a solitary lesion, its neoplastic nature should be confirmed by cytology/histology.
3.1.2. Baseline documentation of "target" and "nontarget" lesions. All measurable lesions up to a maximum of five lesions per organ and 10 lesions in total, representative of all involved organs, should be identified as target lesions and recorded and measured at baseline. Target lesions should be selected on the basis of their size (those with the longest diameter) and their suitability for accurate repeated measurements (either by imaging techniques or clinically). A sum of the longest diameter for all target lesions will be calculated and reported as the baseline sum longest diameter. The baseline sum longest diameter will be used as the reference by which to characterize the objective tumor response.
All other lesions (or sites of disease) should be identified as nontarget lesions and should also be recorded at baseline. Measurements of these lesions are not required, but the presence or absence of each should be noted throughout follow-up.
3.2. Response Criteria 3.2.1. Evaluation of target lesions. This section provides the definitions of the criteria used to determine objective tumor response for target lesions. The criteria have been adapted from the original WHO Handbook (3), taking into account the measurement of the longest diameter only for all target lesions: complete responsethe disappearance of all target lesions; partial responseat least a 30% decrease in the sum of the longest diameter of target lesions, taking as reference the baseline sum longest diameter; progressive diseaseat least a 20% increase in the sum of the longest diameter of target lesions, taking as reference the smallest sum longest diameter recorded since the treatment started or the appearance of one or more new lesions; stable diseaseneither sufficient shrinkage to qualify for partial response nor sufficient increase to qualify for progressive disease, taking as reference the smallest sum longest diameter since the treatment started.
3.2.2. Evaluation of nontarget lesions. This section provides the definitions of the criteria used to determine the objective tumor response for nontarget lesions: complete responsethe disappearance of all nontarget lesions and normalization of tumor marker level; incomplete response/stable diseasethe persistence of one or more nontarget lesion(s) and/or the maintenance of tumor marker level above the normal limits; and progressive diseasethe appearance of one or more new lesions and/or unequivocal progression of existing nontarget lesions (1).
(Note: Although a clear progression of "nontarget" lesions only is exceptional, in such circumstances, the opinion of the treating physician should prevail and the progression status should be confirmed later by the review panel [or study chair]).
3.2.3. Evaluation of best overall response. The best overall response is the best
response
recorded from the start of treatment until disease progression/recurrence (taking as reference for
progressive disease the smallest measurements recorded since the treatment started). In general,
the
patient's best response assignment will depend on the achievement of both measurement
and
confirmation criteria (see section 3.3.1). Table 1
provides overall
responses for all possible combinations of tumor responses in target and nontarget lesions with or
without the appearance of new lesions.
|
(Notes:
- Patients with a global deterioration of health status requiring
discontinuation of treatment without objective evidence of disease
progression at that time should be classified as having "symptomatic
deterioration." Every effort should be made to document the objective
disease progression, even after discontinuation of treatment.
- Conditions that may define early progression, early death, and
inevaluability are study specific and should be clearly defined in each
protocol (depending on treatment duration and treatment periodicity).
- In some circumstances, it may be difficult to distinguish residual
disease from normal tissue. When the evaluation of complete response
depends on this determination, it is recommended that the residual
lesion be investigated (fine-needle aspiration/biopsy) before
confirming the complete response status.)
3.2.4. Frequency of tumor re-evaluation. Frequency of tumor re-evaluation while on treatment should be protocol specific and adapted to the type and schedule of treatment. However, in the context of phase II studies where the beneficial effect of therapy is not known, follow-up of every other cycle (i.e., 6-8 weeks) seems a reasonable norm. Smaller or greater time intervals than these could be justified in specific regimens or circumstances.
After the end of the treatment, the need for repetitive tumor evaluations depends on whether the phase II trial has, as a goal, the response rate or the time to an event (disease progression/death). If time to an event is the main end point of the study, then routine re-evaluation is warranted of those patients who went off the study for reasons other than the expected event at frequencies to be determined by the protocol. Intervals between evaluations twice as long as on study are often used, but no strict rule can be made.
3.3. Confirmatory Measurement/Duration of Response 3.3.1. Confirmation. The main goal of confirmation of objective response in clinical trials is to avoid overestimating the response rate observed. This aspect of response evaluation is particularly important in nonrandomized trials where response is the primary end point. In this setting, to be assigned a status of partial response or complete response, changes in tumor measurements must be confirmed by repeat assessments that should be performed no less than 4 weeks after the criteria for response are first met. Longer intervals as determined by the study protocol may also be appropriate.
In the case of stable disease, measurements must have met the stable disease criteria at least once after study entry at a minimum interval (in general, not less than 6-8 weeks) that is defined in the study protocol (see section 3.3.3).
(Note: Repeat studies to confirm changes in tumor size may not always be feasible or may not be part of the standard practice in protocols where progression-free survival and overall survival are the key end points. In such cases, patients will not have "confirmed response." This distinction should be made clear when reporting the outcome of such studies.)
3.3.2. Duration of overall response. The duration of overall response is measured from the time that measurement criteria are met for complete response or partial response (whichever status is recorded first) until the first date that recurrent or progressive disease is objectively documented (taking as reference for progressive disease the smallest measurements recorded since the treatment started). The duration of overall complete response is measured from the time measurement criteria are first met for complete response until the first date that recurrent disease is objectively documented.
3.3.3. Duration of stable disease. Stable disease is measured from the start of the treatment until the criteria for disease progression is met (taking as reference the smallest measurements recorded since the treatment started). The clinical relevance of the duration of stable disease varies for different tumor types and grades. Therefore, it is highly recommended that the protocol specify the minimal time interval required between two measurements for determination of stable disease. This time interval should take into account the expected clinical benefit that such a status may bring to the population under study.
(Note: The duration of response or stable disease as well as the progression-free survival are influenced by the frequency of follow-up after baseline evaluation. It is not in the scope of this guideline to define a standard follow-up frequency that should take into account many parameters, including disease types and stages, treatment periodicity, and standard practice. However, these limitations to the precision of the measured end point should be taken into account if comparisons among trials are to be made.)
3.4. Progression-Free Survival/Time to Progression This document focuses primarily on the use of objective response end points. In some circumstances (e.g., brain tumors or investigation of noncytoreductive anticancer agents), response evaluation may not be the optimal method to assess the potential anticancer activity of new agents/regimens. In such cases, progression-free survival/time to progression can be considered valuable alternatives to provide an initial estimate of biologic effect of new agents that may work by a noncytotoxic mechanism. It is clear though that, in an uncontrolled trial proposing to utilize progession-free survival/time to progression, it will be necessary to document with care the basis for estimating what magnitude of progression-free survival/time to progression would be expected in the absence of a treatment effect. It is also recommended that the analysis be quite conservative in recognition of the likelihood of confounding biases, e.g., with regard to selection and ascertainment. Uncontrolled trials using progression-free survival or time to progression as a primary end point should be considered on a case-by-case basis, and the methodology to be applied should be thoroughly described in the protocol.
4. Response Review
For trials where the response rate is the primary end point, it is strongly recommended that all responses be reviewed by an expert or experts independent of the study at the study's completion. Simultaneous review of the patients' files and radiologic images is the best approach.
(Note: When a review of the radiologic images is to take place, it is also recommended that images be free of marks that might obscure the lesions or bias the evaluation of the reviewer[s]).
5. Reporting of Results
All patients included in the study must be assessed for response to treatment, even if there are major protocol treatment deviations or if they are ineligible. Each patient will be assigned one of the following categories: 1) complete response, 2) partial response, 3) stable disease, 4) progressive disease, 5) early death from malignant disease, 6) early death from toxicity, 7) early death because of other cause, or 9) unknown (not assessable, insufficient data). (Note: By arbitrary convention, category 9 usually designates the "unknown" status of any type of data in a clinical database.)
All of the patients who met the eligibility criteria should be included in the main analysis of the response rate. Patients in response categories 4-9 should be considered as failing to respond to treatment (disease progression). Thus, an incorrect treatment schedule or drug administration does not result in exclusion from the analysis of the response rate. Precise definitions for categories 4-9 will be protocol specific.
All conclusions should be based on all eligible patients.
Subanalyses may then be performed on the basis of a subset of patients, excluding those for whom major protocol deviations have been identified (e.g., early death due to other reasons, early discontinuation of treatment, major protocol violations, etc). However, these subanalyses may not serve as the basis for drawing conclusions concerning treatment efficacy, and the reasons for excluding patients from the analysis should be clearly reported. The 95% confidence intervals should be provided.
6. Response Evaluation in Randomized Phase III Trials
Response evaluation in phase III trials may be an indicator of the relative antitumor activity of the treatments evaluated but may usually not solely predict the real therapeutic benefit for the population studied. If objective response is selected as a primary end point for a phase III study (only in circumstances where a direct relationship between objective tumor response and a real therapeutic benefit can be unambiguously demonstrated for the population studied), the same criteria as those applicable to phase II trials (RECIST guidelines) should be used.
On the other hand, some of the guidelines presented in this special article might not be required in trials, such as phase III trials, in which objective response is not the primary end point. For example, in such trials, it might not be necessary to measure as many as 10 target lesions or to confirm response with a follow-up assessment after 4 weeks or more. Protocols should be written clearly with respect to planned response evaluation and whether confirmation is required so as to avoid post-hoc decisions affecting patient evaluability.
| APPENDIX I. SPECIFICATIONS FOR RADIOLOGIC IMAGING |
|---|
|
|
|---|
These notes are recommendations for use in clinical studies and, as such, these protocols for computed tomography (CT) and magnetic resonance imaging (MRI) scanning may differ from those employed in clinical practice at various institutions. The use of standardized protocols allows comparability both within and between different studies, irrespective of where the examination has been undertaken.
Specific Notes
For chest x-ray, not only should the film be performed in full inspiration in the posteroanterior projection, but also the film to tube distance should remain constant between examinations. However, patients in trials with advanced disease may not be well enough to fulfill these criteria, and such situations should be reported together with the measurements.
Lesions bordering the thoracic wall are not suitable for measurements by chest x-ray, since a slight change in position of the patients can cause considerable differences in the plane in which the lesion is projected and may appear to cause a change that is actually an artifact. These lesions should be followed by a CT or an MRI. Similarly, lesions bordering or involving the mediastinum should be documented on CT or MRI.
CT scans of the thorax, abdomen, and pelvis should be contiguous
throughout the anatomic
region of interest. As a rule of thumb, the minimum size of the lesion should be no less than
double the
slice thickness. Lesions smaller than this are subject to substantial "partial volume"
effects (i.e., size is underestimated because of the distance of the cut from the longest diameter;
such a
lesion may appear to have responded or progressed on subsequent examinations, when, in fact,
they
remain the same size [Fig. 1
]). This minimum lesion size for a
given
slice thickness at baseline ensures that any lesion appearing smaller on subsequent examinations
will
truly be decreasing in size. The longest diameter of each target lesion should be selected in the
axial
plane only.
|
The type of CT scanner is important regarding the slice thickness and minimum-sized lesion. For spiral (helical) CT scanners, the minimum size of any given lesion at baseline may be 10 mm, provided the images are reconstructed contiguously at 5-mm intervals. For conventional CT scanners, the minimum-sized lesion should be 20 mm by use of a contiguous slice thickness of 10 mm.
The fundamental difference between spiral and conventional CT is that conventional CT acquires the information only for the particular slice thickness scanned, which is then expressed as a two-dimensional representation of that thickness or volume as a gray scale image. The next slice thickness needs to be scanned before it can be imaged and so on. Spiral CT acquires the data for the whole volume imaged, typically the whole of the thorax or upper abdomen in a single breath hold of about 20-30 seconds. To view the images, a suitable reconstruction algorithm is selected, by the machine, so the data are appropriately imaged. As suggested above, for spiral CT, 5-mm reconstructions can be made, thereby allowing a minimum-sized lesion of 10 mm.
Spiral CT is now the standard in most hospitals involved in cancer management in the United States, Europe, and Japan, so the above comments related to spiral CT are pertinent. However, some institutions involved in clinical trials will have conventional CT, but the number of these scanners will decline as they are replaced by spiral CT.
Other body parts, where CT scans are of different slice thickness (such as the neck, which is typically 5-mm thickness), or in the young pediatric population, where the slice thickness may be different, the minimum-sized lesion allowable for measurability of the lesion may be different. However, it should be double the slice thickness. The slice thickness and the minimum-sized lesion should be specified in the study protocol.
In patients in whom the abdomen and pelvis have been imaged, oral contrast agents should be given to accentuate the bowel against other soft-tissue masses. This procedure is almost universally undertaken on a routine basis.
Intravenous contrast agents should also be given, unless contraindicated for medical reasons such as allergy. This is to accentuate vascular structures from adjacent lymph node masses and to help enhance liver and other visceral metastases. Although, in clinical practice, its use may add little, in the context of a clinical study where objective response rate based on measurable disease is the end point, unless an intravenous contrast agent is given, a substantial number of otherwise measurable lesions will not be measurable. The use of intravenous contrast agents may sometimes seem unnecessary to monitor the evolution of specific disease sites (e.g., in patients in whom the disease is apparently restricted to the periphery of the lungs). However, the aim of a clinical study is to ensure that lesions are truly resolving, and there is no evidence of new disease at other sites scanned (e.g., small metastases in the liver) that may be more easily demonstrated with the use of intravenous contrast agent that should, therefore, also be considered in this context.
The method of administration of intravenous contrast agents is variable. Rather than try to institute rigid rules regarding methods for administering contrast agents and the volume injected, it is appropriate to suggest that an adequate volume of a suitable contrast agent should be given so that the metastases are demonstrated to best effect and a consistent method is used on subsequent examinations for any given patient.
All images from each examination should be included and not "selected" images of the apparent lesion. This distinction is intended to ensure that, if a review is undertaken, the reviewer can satisfy himself/herself that no other abnormalities coexist. All window settings should be included, particularly in the thorax, where the lung and soft-tissue windows should be considered.
Lesions should be measured on the same window setting on each examination. It is not
acceptable
to measure a lesion on lung windows on one examination and on soft-tissue settings on the next
(Fig. 2
). In the lung, it does not really matter whether lung or soft-tissue
windows are
used for intraparenchymal lesions, provided a thorough assessment of nodal and parenchymal
disease
has been undertaken and the target lesions are measured as appropriate by use of the same
window
settings for repeated examinations throughout the study.
|
Use of MRI is a complex issue. MRI is entirely acceptable and capable of providing images in different anatomic planes. It is, therefore, important that, when MRI is used, lesions must be measured in the same anatomic plane by use of the same imaging sequences on subsequent examinations. MRI scanners vary in the images produced. Some of the factors involved include the magnet strength (high-field magnets require shorter scan times, typically 2-5 minutes), the coil design, and patient cooperation. Wherever possible, the same scanner should be used. For instance, the images provided by a 1.5-Tesla scanner will differ from those provided by a 0.5-Tesla scanner. Although comparisons can be made between images from different scanners, such comparisons are not ideal. Moreover, many patients with advanced malignancy are in pain, so their ability to remain still for the duration of a scan sequenceon the order of 2-5 minutesis limited. Any movement during the scan time leads to motion artifacts and degradation of image quality, so that the examination will probably be useless. For these reasons, CT is, at this point in time, the imaging modality of choice.
Ultrasound examinations should not be used in clinical trials to measure
tumor regression or
progression of lesions that are not superficial because the examination is necessarily subjective.
Entire
examinations cannot be reproduced for independent review at a later date, and it must be
assumed,
whether or not it is the case, that the hard-copy films available represent a true and accurate
reflection
of events (Fig. 3
). Furthermore, if, for example, the only measurable
lesion is in
the para-aortic region of the abdomen and if gas in the bowel overlies the lesion, the lesion will
not be
detected because the ultrasound beam cannot penetrate the gas. Accordingly, the disease staging
(or
restaging for treatment evaluation) for this patient will not be accurate.
|
The same imaging modality must be used throughout the study to measure disease. Different imaging techniques have differing sensitivities, so any given lesion may have different dimensions at any given time if measured with different modalities. It is, therefore, not acceptable to interchange different modalities throughout a trial and use these measurements. It must be the same technique throughout.
It is desirable to try to standardize the imaging modalities without adding undue constraints so that patients are not unnecessarily excluded from clinical trials.
| APPENDIX II. RELATIONSHIP BETWEEN CHANGE IN DIAMETER, PRODUCT, AND VOLUME |
|---|
|
|
|---|
|
| APPENDIX III. RESPONSE EVALUATION CRITERIA IN SOLID TUMORS (RECIST) WORKING GROUP AND SPECIAL ACKNOWLEDGMENTS |
|---|
|
|
|---|
RECIST Working Group
P. Therasse (Chair), J. Verweij, M. Van Glabbeke, A. T. van Oosterom, European Organization for Research and Treatment of Cancer (Brussels, Belgium); S. G. Arbuck, R. S. Kaplan, M. C. Christian, National Cancer Institute, United States (Bethesda, MD); E. Eisenhauer, National Cancer Institute of Canada Clinical Trials Group (Kingston); S. Gwyther, East Surrey Hospital (Redhill, U.K.); and J. Wanders, New Drug Development Office Oncology (Amsterdam, The Netherlands).
Retrospective Analyses
L. A. Rubinstein, National Cancer Institute, United States; B. K. James, A. Muldal, W. Walsh, National Cancer Institute of Canada Clinical Trials Group; S. Green, Southwest Oncology Group (Seattle, WA); M. Terenziani, National Cancer Institute (Milan, Italy); D. Vena, Emmes Corporation (Rockville, MD); R. Canetta, J. Burroughs, Bristol-Myers Squibb (Wallingford, CT); A. Riva, M. Murawsky, Rhone-Poulenc Rorer Pharmaceuticals Inc. (Paris, France).
| APPENDIX IV. PARTICIPANTS IN THE OCTOBER 1998 WORKSHOP TO DEVELOP THE FINAL RESPONSE EVALUATION CRITERIA IN SOLID TUMORS (RECIST) DOCUMENT AND FURTHER ACKNOWLEDGMENTS |
|---|
|
|
|---|
Participants
S. C. S. Kao, Children's Cancer Study Group (Iowa City, IA); D. Grinblatt, Cancer and Leukemia Group B (CALGB) (Chicago, IL); B. Giantonio, Eastern Cooperative Oncology Group (ECOG) (Philadelphia, PA); F. B. Stehman, Gynecologic Oncology Group (GOG) (Indianapolis, IN); A. Trotti, Radiation Therapy Oncology Group (Tampa, FL); C. A. Coltman, Southwest Oncology Group (SWOG) (San Antonio, TX); R. E. Smith, National Surgical Adjuvant Breast and Bowel Project (Pittsburgh, PA); J. Zalcberg, Peter MacCallum Cancer Institute (Melbourne), Australia; N. Saijo, National Cancer Center Hospital (Tokyo, Japan); Y. Fujiwara, National Institute of Health Sciences (Tokyo); G. Schwartsmann, Hospital de Clinicas de Porto Alegre (Brazil); A. Klein, Health Canada, Bureau of Pharmaceutical Assessment (Ottawa, ON); B. Weinerman, National Cancer Institute of Canada Clinical Trials Group (Kingston, ON); D. Warr, Ontario Cancer Institute/Princess Margaret Hospital (Toronto); P. Liati, South Europe New Drugs Organization (Milan, Italy); S. Einstein, Bio-Imaging Technologies (West Trenton, NJ); S. Négrier, L. Ollivier, Fédération Nationale des Centres de Lutte contre le Cancer (Paris, France); M. Marty, International Cancer Cooperative Group/French Drug Agency (Paris); H. Anderson, A. R. Hanauske, European Organization for Research and Treatment of Cancer (EORTC) (Brussels, Belgium); M. R. Mirza, Odense University Hospital (Denmark); J. Ersboll, The European Agency for the Evaluation of Medicinal Products (Bronshoj, Denmark); C. Pagonis, Cancer Research Campaign (London, U.K.); S. Hatty, Eli Lilly and Co., (Surrey, U.K.); A. Riva, Rhone-Poulenc Rorer Pharmaceuticals Inc. (Paris); C. Royce, GlaxoWellcome (Middlesex, U.K.); G. Burke, Novartis Pharma AG (Basel, Switzerland); I. Horak, Janssen Research Foundation (Beerse, Belgium); G. Hoctin-Boes, Zeneca (Macclesfield Cheshire, U.K.); C. Weil, Bristol-Myers Squibb (Waterloo, Belgium); M. G. Zurlo, Pharmacia & Upjohn (Milan); S. Z. Fields, SmithKline Beecham Pharmaceuticals (Collegeville, PA); B. Osterwalder, Hoffmann-La Roche Inc. (Basel); Y. Shimamura, Taiho Pharmaceutical Co. Ltd. (Tokyo); and M. Okabe, Kyowa-Hakko-Kogyo Co. Ltd. (Tokyo).
Additional comments were received from the following:
A. Hamilton, R. De Wit, E. Van Cutsem, J. Wils, J.-L. Lefèbvre, I. Vergote, M. S. Aapro, J.-F. Bosset, M. Hernandez-Bronchud, D. Lacombe, H. J. Schmoll, E. Van Limbergen, P. Fumoleau, A. Bowman, U. Bruntsch, EORTC (Brussels); B. Escudier, P. Thiesse, N. Tournemaine, P. Troufleau, C. Lasset, F. Gomez, Fédération Nationale des Centres de Lutte contre le Cancer (Paris); G. Rustin, Mount Vernon Hospital (Northwood Middlesex, U.K.); S. B. Kaye, Western Infirmary (Glasgow, U.K.); A. Goldhirsch, F. Nolè, G. Zampino, F. De Braud, M. Colleoni, E. Munzone, T. De Pas, International Breast Cancer Study Group and Istituto Europeo di Oncologia (Milan); M. Castiglione, J. F. Delaloye, A. Roth, C. Sessa, D. Hess, B. Thürlimann, C. Böhme, T. Cerny, U. Hess, Schweizer Arbeitsgemeinschaft für Klinische Krebsforschung (Bern, Switzerland); H. J. Stewart, Scottish Cancer Therapy Network (Edinburgh, U.K.); A. Howell, J. F. R. Robertson, United Kingdom Coordinating Committee on Cancer Research (Nottingham); K. Noever, Bio-Imaging Technologies (Monheim, Germany); M. Kurihara, Toyosu Hospital, SHOWA University (Tokyo); L. Seymour, J. Pater, J. Rusthoven, F. Shepherd, J. Maroun, G. Cairncross, D. Stewart, K. Pritchard, National Cancer Institute of Canada Clinical Trials Group (Kingston); T. Uscinowicz, Health Canada, Bureau of Pharmaceutical Assessment (Ottawa); I. Tannock, Princess Margaret Hospital (Toronto); M. Azab, QLT Phototherapeutics (Vancouver, Canada); V. H. C. Bramwell, Canadian Sarcoma Group (London); P. O'Dwyer, ECOG (Philadelphia); A. Martin, S. Ellenberg, U.S. Food and Drug Administration (Rockville, MD); C. Chow, D. Sullivan, A. Murgo, A. Dwyer, J. Tatum, National Cancer Institute (Bethesda, MD); R. Schilsky, CALGB (Chicago, IL); J. Crowley, S. Green, SWOG (Seattle, WA); R. Park, GOG (Philadelphia, PA); V. Land, B. D. Fletcher, Pediatric Oncology Group (Chicago, IL); B. Hillman, University of Virginia (Charlottesville); F. Muggia, New York University Medical Center (New York); C. Erlichman, Mayo Clinic (Rochester, MN); L. H. Schwartz, Memorial Sloan-Kettering Cancer Center (New York, NY); S. P. Balcerzak, Ohio State University Health Sciences Center (Columbus); G. Fleming, CALGB (Chicago); G. Sorensen, Harvard University (Cambridge, MA); H. Levy, Thomas Jefferson University (Philadelphia); N. Patz, Duke University (Durham, NC); C. Visseren-Grul, Eli Lilly Nederland BV (Nieuwegein, The Netherlands)/J. Walling, Lilly Research Laboratories (Indianapolis); P. Hellemans, Janssen Research Foundation (Beerse, Belgium); L. Finke, Merck (Darmstadt, Germany); A. Man, N. Barbet, Novartis Pharma AG (Basel); G. Massimini, Pharmacia & Upjohn (Milan); J, Jimeno, Pharma Mar (Madrid, Spain); I. Hudson, SmithKline Beecham Pharmaceuticals (Essex, U.K.); and J. Krebs, R. A. Beckman, S. Lane, D. Fitts, SmithKline Beecham Pharmaceuticals (Collegeville).
| APPENDIX V. RETROSPECTIVE COMPARISON OF RESPONSE/DISEASE PROGRESSION RATES OBTAINED WITH THE WORLD HEALTH ORGANIZATION (WHO)/SOUTHWEST ONCOLOGY GROUP CRITERIA AND THE NEW RESPONSE EVALUATION CRITERIA IN SOLID TUMORS (RECIST) CRITERIA |
|---|
|
|
|---|
To evaluate the hypothesis by which unidimensional measurement of tumor lesions may substitute for the usual bidimensional approach, a number of retrospective analyses have been undertaken. The results of these analysis are given below in this section.
1. Comparison of Response and Disease Progression Rates by Use of WHO (or Modified WHO) or RECIST Methods
1.1. Trials Evaluated No specific selection criteria were employed except that trial data had to include serial (repeated) records of tumor measurements. Several groups evaluated their own data on one or more such studies (National Institute of Canada Clinical Trials Group, Kingston, ON; U.S. National Cancer Institute, Bethesda, MD; and Rhone-Poulenc Rorer Pharmaceuticals Inc., Paris, France) or made data available for evaluation to the U.S. National Cancer Institute (Southwest Oncology Group and Bristol-Myers Squibb, Wallingford, CT)
1.2. Response Criteria Evaluated
Not all databases were assessed for all response outcomes. At the
outset of this process, the most interest was in the assessment of
complete plus partial response rate comparisons by both the WHO and new
RECIST criteria. Once these data suggested no impact of using the new
criteria on the response rate, several more databases were analyzed for
the impact of the use of the new criteria not only on complete response
plus partial response but also on stable disease and progressive
disease rates (see Appendix V, Table 4
) and on time to disease
progression (see Appendix V, Table 5
).
|
|
|
1.3. Methods of Comparison For each patient in each study, baseline sums were calculated (sum of products of the two longest diameters in perpendicular dimensions for WHO and sum of longest diameters for RECIST). After each assessment, when new tumor measures were available, the sums were recalculated. Patients were assigned complete response, partial response, stable disease, and progressive disease as their "best" response on the basis of achieving the measurement criteria as indicated in Appendix V, Table 3
(Note: For WHO progressive disease, as is the convention in most groups, an increase in sums of products was required, not an increase in only one lesion.)
1.4. Results
2. Evaluation of Time to Disease Progression
Time to disease progression was evaluated, comparing WHO criteria with RECIST in a dataset provided by the Southwest Oncology Group (SWOG). Since SWOG criteria (5) for disease progression is a 50% increase in the sum of the products, or new disease, or an absolute increase of 10 cm2 in the sum of the products, this dataset provided the means of assessing the impact of time to disease progression differences between a 25% increase in the sum of the products and a 20% increase in the sum of the longest diameters (equivalent to approximately a 44% increase in the product sum).
2.1. Dataset Evaluated
The dataset includes 234 patients with progressive disease as
defined by the SWOG (5). All patients had baseline measurable
disease followed by the same technique(s) until disease progression.
The tumor types included were melanoma and colorectal, lung, and
breast cancers.![]()
|
| REFERENCES |
|---|
|
|
|---|
1 Zubrod CG, Schneiderman SM, Frei E III, Brindley C, Gold GL, Schnider B, et al. Appraisal of methods for the study of chemotherapy of cancer in man: comparative therapeutic trial of nitrogen mustard and thio phosphoamide. J Chronic Dis 1960;11:7-33.[CrossRef]
2 Gehan E, Schneidermann M. Historical and methodological developments in clinical trials at the National Cancer Institute. Stat Med 1990;9:871-80.[ISI][Medline]cancerlit;91019040
3 WHO handbook for reporting results of cancer treatment. Geneva (Switzerland): World Health Organization Offset Publication No. 48; 1979.
4 Miller AB, Hogestraeten B, Staquet M, Winkler A. Reporting results of cancer treatment. Cancer 1981;47:207-14.[CrossRef][ISI][Medline]cancerlit;81111765
5 Green S, Weiss GR. Southwest Oncology Group standard response criteria, endpoint definitions and toxicity criteria. Invest New Drugs 1992;10:239-53.[CrossRef][ISI][Medline]cancerlit;93138901
6
James K, Eisenhauer E, Christian M, Terenziani M, Vena D,
Mudal
A, et al. Measuring response in solid tumors: unidimensional versus bidimensional measurement. J
Natl Cancer Inst 1999;91:523-8.
Manuscript received June 24, 1999; revised November 30, 1999; accepted December 3, 1999.
Correspondence about this Article
![]()
CiteULike
Connotea
Del.icio.us What's this?
J Natl Cancer Inst 2000 92: 1534-1535.
J Natl Cancer Inst 2004 96: 487-488.
This article has been cited by other articles:
![]() |
S. I. Sherman, L. J. Wirth, J.-P. Droz, M. Hofmann, L. Bastholt, R. G. Martins, L. Licitra, M. J. Esche |








