Journal of the National Cancer Institute Advance Access originally published online on July 24, 2007
JNCI Journal of the National Cancer Institute 2007 99(15):1141-1143; doi:10.1093/jnci/djm079
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© The Author 2007. Published by Oxford University Press.
|
|
EDITORIALS |
The More Eyes, the Better to See? From Double to Quadruple Reading of Screening Mammograms
Affiliations of authors: Department of Madicine, University of Washington School of Medicine, Seattle, WA (JGE); Department of Radiology, University of California, San Francisco School of Medicine, San Francisco, CA (RJB)
Correspondence to: Joann G. Elmore, MD, MPH, 325 9th Street, Division of General Internal Medicine, Harborview Medical Center, Box 359780, Seattle, WA 98104 (e-mail: jelmore{at}u.washington.edu).
Clinically important visible abnormalities may go undetected during mammography screening for two main reasons: they are either overlooked or misinterpreted (1). Among women with interval breast cancers that are diagnosed between routine screening mammographic examinations, 10%–20% have lesions that were visible but overlooked at their previous examination, a similar percentage have lesions that were misinterpreted at the previous examination, and the remainder have lesions that were mammographically occult (2,3). For these women, the interval cancers represent lost opportunities for an early diagnosis; for the radiologists who interpreted the mammograms, they may have legal ramifications. Indeed, a delay in the diagnosis of breast cancer was the most common reason for malpractice lawsuits against American physicians in a study of claims resolved between 1995 and 2001 (4).
In an attempt to maximize the number of breast cancers detected, investigators have been striving for years to understand the reasons why some cancers are missed on mammograms. In sequential studies of mammogram review time and radiologist eye position, Nodine et al. (5,6) found that most lesions were detected by a given observer within 25 seconds of image review and that faulty visual review patterns account for most missed cancers. Increased viewing time beyond 25 seconds tended to decrease specificity more than it increased sensitivity (6). It has been suggested (7) that readers who review a critical volume of mammograms (defined as
2500 cases per year) have the best visual review patterns. Thus, if a given individual's visual review pattern and ability to detect breast cancer are inherently limited, then the breast cancer detection rate should be improved by the addition of other readers who have different and, presumably, complementary visual search patterns, regardless of whether the other readers are radiologists or technologists, as suggested by results of a study by Duijm et al. (8) in this issue of the Journal.
Double reading of mammograms has been advocated to reduce the proportion of missed cancers and to at least partly offset the wide variability in radiologists interpretations (9). Variability in clinical decision making is common in all areas of medicine, from interpretation of findings on physical examination to that on radiographic images (10). Increasing the number of individuals who interpret a given imaging modality has been shown to increase the cancer detection rate for many organ systems, from breast cancer detection in mammography to lung cancer detection in chest radiographs and computed tomography scans (11,12).
The community-based study by Duijm et al. (8) illustrates the potential to increase cancer detection when the standard double reading of screening mammograms by radiologists is augmented with secondary interpretations performed by nonphysician technologists. In this study, of the 61251 screening mammograms that were double read by radiologists, the technologists noted an abnormality on 446 examinations that had been interpreted as normal, prompting the radiologists to rereview the images. This rereview led to 80 referrals that resulted in the detection of 22 additional cancers (8). Although the additional interpretation by technologists improved the cancer detection rate, the recall rate was also modestly increased. Thus, an optimal balance is needed among several variables, including cancer detection rates, costs, manpower, compliance, and legal repercussions.
Rather than assuming that double reading will always be beneficial, Taplin et al. (13) have questioned the absolute value of double reading. There are many different methods of double reading, and the outcomes may depend on how well the reading techniques or patterns of the two readers complement each other. It has been said that if two people were exactly alike, one of them would be unnecessary (14). Having individuals with complementary visual search patterns review mammograms, rather than relying on review by multiple viewers with the same search pattern, may offer the greatest potential for achieving an acceptable balance between recall and cancer detection rates. However, visual search patterns among readers are not easily determined.
The data presented by Duijm et al. (8) offer insight into the interpretive performance of several possible methods of double reading: 1) double reading by technologists alone; 2) double reading by radiologists alone; 3) double reading by radiologists, with an additional rereview by radiologists of any examinations determined positive by the technologists; or 4) double reading by radiologists and double reading by technologists, with referral for additional workup on any examination with a positive interpretation. Generalizing the results of Duijm et al. to other screening programs is challenging, however, because of international variability in screening programs. For example, most European programs, including the one in The Netherlands, where this study was done, have been constructed to mandate recall rates that are considerably lower than those reported in the United States (15). The recall rates of 1%–2% reported by Duijm et al. (8) contrast sharply with the 10% rate reported in the United States (15). In part, the higher recall rate in the United States may reflect the harsher legal environment that radiologists are exposed to when there is a delayed breast cancer diagnosis (4). The Dutch program also differs from the US system by its ready availability of breast imaging radiologists who have more imaging experience and higher annual volumes of interpretation than those in the United States. Unfortunately, increasingly fewer US radiology residents in training report an interest in breast imaging, and many practicing radiologists no longer wish to interpret mammograms (16,17). Thus, the pool of radiologists who are available to interpret mammograms in the United States is shrinking, while the demand for mammography continues to grow along with the aging population.
Given such restrictions on available personnel, computer-aided detection (CAD) has been advocated as a digital "second reader," with conflicting results thus far reported for its effects on outcome (18,19). Unfortunately, CAD programs mark multiple areas on each screening examination, which leads to higher recall rates; a true-positive CAD mark is issued far less than 1% of the time (20). The high recall rates associated with CAD programs have not incurred enthusiasm or widespread support (20). Nonetheless, CAD has rapidly been incorporated as a billable add-on to screening mammography in many circumstances.
Various methods for double reading of breast images exist, further complicating the generalizability of the study by Duijm et al. (8). In The Netherlands and many other European countries, double reading by radiologists of all screening mammograms is the norm. By contrast, a survey of 45 US mammography facilities found that only about half reported any type of double reading (21). Methods for double reading in the United States include independent reading, fast double-check reading, and CAD. The percentage of examinations that are double read and the double-reading methods employed are also extremely variable. The second radiologist may be blinded to the first interpretation, or not, whereas methods for resolving disagreements may range from consensus to acceptance of the most serious interpretation to inclusion of a third radiologist (who may or may not be blinded to the results of previous readings). As a result, comparing double-reading studies is less like comparing apples to oranges than like comparing apples to asparagus.
Interpreter variability is unavoidable in a field that—unlike laboratory blood tests—is subject to cognitive error. Such variation appears more prevalent in screening mammography than in any other imaging interpretation (22). Increasing the number of readers is a bona fide attempt to improve the accuracy of image interpretation under current conditions. Yet the push for an improved cancer detection rate by double or quadruple reading of all examinations needs to be balanced against the potential for higher recall and false-positive rates. To increase recall rates beyond a threshold of approximately 5% will result in many more biopsy referrals and false positives, with only a modest improvement in cancer detection (23,24). Ultimately, deciding on the number of readers needed to interpret a screening mammogram will depend on how many readers are available and which outcomes we seek.
Funding
Agency for HealthCare Research and Quality (public health service grant R01 HS-010591 to J. G. Elmore); National Cancer Institute (R01 CA–107623 to J. G. Elmore).
NOTES
The authors would like to thank Raymond Harris, PhD, and R. J. Lambert for editorial guidance.
REFERENCES
(1) Vainio H, Bianchini F. Breast cancer screening. International Agency for Research on Cancer (IARC) handbooks of cancer prevention (2002) Lyon (France): IARC Press.
(2) Brenner RJ. False-negative mammograms. Medical, legal, and risk management implications. Radiol Clin North Am (2000) 38:741–57.[CrossRef][ISI][Medline]
(3) Ikeda DM, Andersson I, Wattsgard C, Janzon L, Linell F. Interval carcinomas in the Malmo Mammographic Screening Trial: radiographic appearance and prognostic considerations. AJR Am J Roentgenol (1992) 159:287–94.
(4) Physician Insurers Association of America. Breast cancer study (2002) 3rd ed. Washington (DC): Physician Insurers Association of America.
(5) Nodine CF, Mello-Thoms C, Weinstein SP, Kundel HL, Conant EF, Heller-Savoy RE, et al. Blinded review of retrospectively visible unreported breast cancers: an eye-position analysis. Radiology (2001) 221:122–9.
(6) Nodine CF, Mello-Thoms C, Kundel HL, Weinstein SP. Time course of perception and decision making during mammographic interpretation. AJR Am J Roentgenol (2002) 179:917–23.
(7) Kan L, Olivotto IA, Warren Burhenne LJ, Sickles EA, Coldman AJ. Standardized abnormal interpretation and cancer detection ratios to assess reading volume and reader performance in a breast screening program. Radiology (2000) 215:563–7.
(8) Duijm LEM, Groenewoud JH, Fracheboud J, de Koning HJ. Additional double reading of screening mammograms by radiologic technologists: impact on screening performance parameters. J Natl Cancer Inst (2007) 99:1162–70.
(9) Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. Arch Intern Med (1996) 156:209–13.[Abstract]
(10) Elmore J, Feinstein A. A bibliography of publications on observer variability. J Clin Epidemiol (1992) 45:567–80.[CrossRef][ISI][Medline]
(11) Wormanns D, Ludwig K, Beyer F, Heindel W, Diederich S. Detection of pulmonary nodules at multirow-detector CT: effectiveness of double reading to improve sensitivity at standard-dose and low-dose chest CT. Eur Radiol (2005) 15:14–22.[CrossRef][ISI][Medline]
(12) Quekel LG, Goei R, Kessels AG, van Engelshoven JM. Detection of lung cancer on the chest radiograph: impact of previous films, clinical information, double reading, and dual reading. J Clin Epidemiol (2001) 54:1146–50.[CrossRef][ISI][Medline]
(13) Taplin S, Rutter C, Elmore J, Seger D, White D, Brenner R. Accuracy of screening mammography using single versus independent double interpretation. AJR Am J Roentgenol (2000) 174:1257–62.
(14) Brainy quote. Larry Dixon quotes. Available at: http://www.brainyquote.com/quotes/quotes/1/larrydixon185870.html. [Last accessed: June 19, 2007.].
(15) Rosenberg RD, Yankaskas BC, Abraham LA, Sickles EA, Lehman CD, Geller BM, et al. Performance benchmarks for screening mammography. Radiology (2006) 241:55–66.
(16) Bassett LW, Monsees BS, Smith RA, Wang L, Hooshi P, Farria DM, et al. Survey of radiology residents: breast imaging training and attitudes. Radiology (2003) 227:862–9.
(17) D'Orsi C, Tu SP, Nakano C, Carney PA, Abraham LA, Taplin SH, et al. Current realities of delivering mammography services in the community: do challenges with staffing and scheduling exist? Radiology (2005) 235:391–5.
(18) Warren Burhenne LJ, Wood SA, D'Orsi CJ, Feig SA, Kopans DB, O'Shaughnessy KF, et al. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology (2000) 215:554–62.
(19) Fenton JJ, Taplin SH, Carney PA, Abraham L, Sickles EA, D'Orsi C, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med (2007) 356:1399–409.
(20) Brenner RJ, Ulissey MJ, Wilt RM. Computer-aided detection as evidence in the courtroom: potential implications of an appellate court's ruling. AJR Am J Roentgenol (2006) 186:48–51.
(21) Hendrick RE, Cutter G, Berns E, Nakano C, Egger J, Carney P, et al. Community-based mammography practice: services, charges, and interpretation methods. AJR Am J Roentgenol (2005) 184:433–8.
(22) Soffa DJ, Lewis RS, Sunshine JH, Bhargavan M. Disagreement in interpretation: a method for the development of benchmarks for quality assurance in imaging. J Am Coll Radiol (2004) 1:212–7.[CrossRef][Medline]
(23) Yankaskas BC, Cleveland RJ, Schell MJ, Kozar R. Association of recall rates with sensitivity and positive predictive values of screening mammography. AJR Am J Roentgenol (2001) 177:543–9.
(24) Schell MJ, Yankaskas BC, Ballard-Barbash R, Qaqish BF, Barlow WE, Rosenberg RD, et al. Evidence-based target recall rates for screening mammography. Radiology (2007) 243:681–9.
Related Articles in JNCI
![]()
CiteULike
Connotea
Del.icio.us What's this?
J Natl Cancer Inst 2007 99: 1162-1170.
J Natl Cancer Inst 2007 99: 1137.
J Natl Cancer Inst 2007 99: 1137.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||