Skip Navigation

JNCI Journal of the National Cancer Institute 2003 95(4):282-290; doi:10.1093/jnci/95.4.282
© 2003 by Oxford University Press
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Beam, C. A.
Right arrow Articles by Sickles, E. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Beam, C. A.
Right arrow Articles by Sickles, E. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Journal of the National Cancer Institute, Vol. 95, No. 4, 282-290, February 19, 2003
© 2003 Oxford University Press


ARTICLE

Association of Volume and Volume-Independent Factors With Accuracy in Screening Mammogram Interpretation

Craig A. Beam, Emily F. Conant, Edward A. Sickles

Affiliations of authors: C. A. Beam, Department of Radiology, Medical College of Wisconsin, Milwaukee, and the H. Lee Moffitt Cancer Center & Research Institute at the University of South Florida, Tampa; E. F. Conant, Department of Radiology, University of Pennsylvania, Philadelphia; E. A. Sickles, Department of Radiology, University of California, San Francisco.

Correspondence to: Craig Beam, Ph.D., Biostatistics Core, H. Lee Moffitt Cancer Center & Research Institute, University of South Florida, 12902 Magnolia Dr., Tampa, FL 33612-9497 (e-mail: beamca{at}moffitt.usf.edu).


    ABSTRACT
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Background: Early detection of breast cancer is associated with the accurate reading of screening mammograms, but factors that influence reading accuracy are not well understood. We thus investigated whether reading volume and other factors were independently associated with accuracy in reading screening mammograms in a population of U.S. radiologists. Methods: A random selection of 110 of 292 radiologists who agreed to participate, if selected, interpreted screening mammograms from 148 randomly selected women. Original index mammograms (i.e., mediolateral oblique and craniocaudal views of each breast) were used; comparison original mammograms were provided when available. Radiologist-level and facility-level factors were surveyed. Two standard metrics of screening accuracy, both based on receiver operating characteristic curves, were analyzed. The influence of volume on accuracy after controlling for other factors was assessed with multiple regression analysis. Results: Current reading volume was not statistically significantly associated with interpretive accuracy. More recently trained radiologists interpreted mammograms more accurately than those trained earlier (-0.76% [95% confidence interval (CI) = -1.75% to -0.02%] reduction in sensitivity per year since residency). Facility-level factors that were statistically significantly and independently associated with better accuracy were the number of diagnostic breast imaging examinations and image-guided breast interventional procedures performed (0.55% [95% CI = 0.11% to 2.40%] increase in accuracy per examination or procedure offered), being classified as a comprehensive breast diagnostic and/or screening center or freestanding mammography center (1.39% [95% CI = 0.15% to 3.82%] higher than a hospital radiology department or multispecialty medical clinic), and being a facility that practiced double reading (1.61% [95% CI = 1.99% to 11.65%]) higher than in a facility without such practice). Conclusions: Individual radiologists’ current reading volume was not statistically significantly associated with accuracy in reading screening mammograms, but several other factors were. Expertise reflects a complex multifactorial process that needs further clarification.



    INTRODUCTION
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
During the past 15 years, evidence has accumulated that supports the hypothesis that the number of surgeries performed by a surgeon (surgeon volume) is an important determinant of the quality of care and outcome in several areas of surgery and surgical oncology (16), including the treatment and management of breast cancer (79). However, there is currently conflicting research concerning the relationship between the number of mammograms read by a radiologist (radiologist’s reading volume) and the radiologist’s accuracy in the detection and diagnosis of breast cancer from mammograms. Accurate reading of screening mammograms is important for the early detection of breast cancer, and reading expertise is a process that needs to be better understood.

To the best of our knowledge, six studies (1015) have investigated the relationship between expertise and reading volume in mammography. In two of these studies (13,14), the relationship observed between volume and expertise was believed by the researchers to be strong enough to proffer health policy recommendations. For example, Esserman et al. (14) recommended establishment of high-volume centers with mammogram interpretations to be made by high-volume experienced and dedicated radiologists. Kan et al. (13) suggested that a yearly minimum of 2500 interpretations is sufficient to ensure high quality. In contrast, findings from the four other published studies (1012,15) suggest that the relationship between volume and expertise in mammography may be strongly influenced by other factors. For example, three of the studies (1012) suggest that the quality of feedback given to the radiologist, rather than simply the volume of reading, is an important determinant of the effectiveness of gaining expertise from experience.

This disparity in the published research is important because the implications for improving mammography in each study are so different. To understand how such disparities have occurred, it is important to recognize that each of the four studies (1012,15) that provide findings that question the unique role of volume controlled for characteristics of the readers, whereas the other two studies (13,14) did not. This difference in study design could account for the different findings because it is possible that the apparent relationship between volume and expertise is confounded with, or altered by, other unrecognized factors.

Unfortunately, the four studies (1012,15) that question the solitary volume–expertise relationship used small and/or highly selected reader samples. They were thus not able to transect the full range of variability in the U.S. population of radiologists. In contrast, the two studies supporting the solitary importance of volume are equally limited. Kan et al. (13) did not sample U.S. radiologists. Esserman et al. (14) sampled radiologists exclusively from California and achieved only a 30% participation rate. In addition, the study by Esserman et al. also has limited external validity for U.S. populations of women being screened for breast cancer because the authors chose to treat the case sample as fixed, thus prohibiting them from making inferences about typical case populations.

To address the deficiencies and inconsistencies in existing research and knowledge, we conducted a multifactor population study to determine whether a radiologist’s reading volume and other factors were associated with accuracy in screening mammography. Our study was designed to be relevant to typical clinical populations and to the population of radiologists interpreting screening mammograms in the United States. Our study was also designed to more successfully capture variability and to control for the possible influence of factors that a priori were thought to be possible confounders or modifiers of the relationship between volume and expertise.

It should be emphasized that our study focuses strictly on accuracy in the interpretation of screening mammograms. It does not analyze factors associated with accuracy in the interpretation of diagnostic mammograms. Ability in screening mammogram interpretation (a task focused on the decision to call women back for further work-up) may or may not imply ability in diagnostic interpretation (a task focused on the interpretation of additional work-up that can culminate in the recommendation for tissue biopsy examination). Hence, the associations found and not found in this study should not be assumed to apply to skill in the diagnostic interpretation of mammograms by U.S. physicians.


    METHODS
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Radiologists

Radiologists were recruited to participate in the Variability In Diagnostic Interpretation (VIDI) screening mammography study (16). This study is a research program devoted to the population-based assessment of interpretation variability in diagnostic medicine. Participants for the screening mammography study came from randomly sampled mammography facilities accredited by the U.S. Food and Drug Administration as of January 1, 1998. Stratified random sampling of the 9916 geographically contiguous accredited facilities ensured approximately equal representation across geographic regions (four regions as defined by the U.S. Census) and by minority composition of local screening populations (based on percent minority composition in the ZIP code area of the facility: "less than 50% nonwhite" versus "greater than or equal to 50% nonwhite"). Thus, stratified sampling gave approximately equal numbers of facilities within each of eight strata.

All radiologists at each randomly sampled facility were invited to participate. The procedure followed for recruitment began with a letter to the lead interpreting physician at a sampled facility asking them to distribute our recruitment material to all radiologists who interpret mammograms for their facility. In this way, we sampled permanent faculty as well as temporary faculty. The recruitment material explained the study, requirements, and benefits of participation in the study and asked the radiologists whether they would be willing to participate if randomly sampled. In all, 412 radiologists were contacted, and 292 (71%) expressed willingness to participate in the study, if sampled. The 292 radiologists, grouped by facility, provided our sampling frame for random sampling. Again, we sampled facilities (and hence willing radiologists within facilities) within the strata formed by geographic region and minority composition to arrive at approximately equal numbers of radiologists per strata.

One hundred ten radiologists were randomly selected to participate in this study. There were no statistically significant differences in any of the characteristics summarized in Tables 1 and 2GoGo between the radiologists who did and did not participate in this study.


View this table:
[in this window]
[in a new window]
 
Table 1. Comparison of participating and nonparticipating physicians
 

View this table:
[in this window]
[in a new window]
 
Table 2. Comparison of physician age and experience*
 
Mammograms Selected for the Reading Study

For this research study, we define a "mammogram" as consisting of the four radiographic views that are standard in screening for breast cancer in the United States. Each mammogram (index or comparison examination), therefore, consisted of mediolateral oblique and craniocaudal views of each breast (hence, four views per mammogram).

Index mammograms were obtained from 148 women who were randomly sampled from a large screening program (affiliated with the University of Pennsylvania) covering the period from January 1993 through December 1997. All mammograms selected for this study were reviewed for quality by one of the authors (E. F. Conant), who serves as Director of the Breast Imaging Program at the University of Pennsylvania. No mammogram was rejected because of poor technical quality.

Original film mammograms were used in the reading study. Comparison original film mammograms were provided, as available, to parallel usual clinical practice. Sixty-seven (45%) women had comparison mammographic examinations. Each set of mammograms was from low-dose, film screen mammography performed on dedicated mammography units using single emulsion film. Each set consisted of mediolateral oblique and craniocaudal views of each breast. The index examination of a woman was defined as the one leading to the first biopsy or as the next-to-last mammogram for those women with at least 2 years of follow-up without a biopsy examination. A comparison examination was defined as the screening examination performed immediately before the index examination.

Mammogram sampling was stratified on the disease status of the women screened ("cancer" or "cancer-free," determined by a biopsy examination or a minimum follow-up of 2 years) and age. We used the electronic patient data and biopsy databases maintained by the Breast Imaging Program to stratify women by age at the time of their index mammogram and by disease status. Women were stratified as younger than 50 years, 50–59 years, 60–69 years, and 70 years or older. Once the women were stratified, sampling of women (and hence, mammograms) was done at random within strata. Differences in the availability of mammograms prevented us from meeting our initial goal of equal numbers of women in each age group of women who had cancer and women who were cancer-free.

Although we attempted an equal split, our sampling resulted in a mixture of which 64 (43%) of 148 mammograms were from women with cancer. Ages of women whose mammograms were selected ranged from 40 to 85 years, with a mean of 58 years. Patients with breast cancer tended to be older than cancer-free women (P = .011, {chi}2 test). This situation reflects differences in the availability of original films after the mammograms already stratified by the woman’s age were randomly selected. The examinations from younger patients with breast cancer tended more often to be in clinical use than the examinations from older patients with breast cancer.

Reading Study

All radiologists interpreted the mammograms in a controlled reading environment during two 3-hour periods. All readings were done at a central site, dedicated solely to the study, that permitted the investigators to control ambient light. Eight readers participated at a time.

Mammograms were mounted in random sequence on dedicated mammography alternators (RADX Corp., Houston, TX). The only information presented to the reader was the age of the patient. Before reading, radiologists were instructed that the set of mammograms to be read did not have the mixture of mammograms expected from a typical screening population (two to six cases of breast cancer per 1000 mammograms). Pilot studies done by the investigators have established that this instruction adequately controls for context bias (17) (details available from C. A. Beam). Before the reading session began, a member of the study team led the radiologists through a hands-on orientation session with a set of practice mammograms and provided instruction on using the computer data collection system.

Reading data were captured immediately into a database through laptop computers. A custom computer program operating in real time during the reading session captured the reading data described below and ensured data reliability.

Readers were asked 1) to identify findings, 2) to make a recommendation for further work-up, 3) to report what they believed would be the result of additional work-up, and 4) to give a subjective assessment of the presence of breast cancer for each mammogram. Responses to item 3, which relate to the management of the woman after screening, used the Breast Imaging Reporting and Data System [BI-RADS (18) scale: 1 = normal, return to normal screening; 2 = benign, return to normal screening; 3 = probably benign, 6-month follow-up recommended; 4 = possibly malignant, biopsy recommended; 5 = probably malignant, biopsy strongly recommended] and were used in the receiver operating characteristic (ROC) curve analysis for this study (described below). BI-RADS, a scale for the standardized reporting of mammograms, was developed by the American College of Radiology.

Reader Factors

Two surveys were used to collect data about the readers in our study. One survey collected data about each individual reader, and another collected data about the facility with which the radiologist was affiliated. Among other things, radiologists were asked to report their "Recent Reading Volume," which is the total number of mammograms read in the year before their participation in the study. All survey items were self-reported and not independently verified. Several survey variables were omitted from the regression analysis because of missing data or fewer than 20 observations in any level of a categorical variable. We used radiologist-level factors and facility-level factors in the analysis.

Several levels of the variable "Practice Setting" were combined to ensure adequate sample size for analysis. For analysis, the category "Hospital Radiology Department" was combined with the category "Multispecialty Medical Clinic" (combined n = 55). The category "Comprehensive Breast Diagnostic/Screening Center" was combined with the category "Freestanding Mammography Center" (n = 21).

Statistical Analysis

The expertise of each reader was assessed with two standard measures of screening accuracy based on the ROC curve (19). "Am" is the area under the ROC curve estimated nonparametrically (20). This measure can be interpreted as the ability of the diagnostician to discriminate a mammogram showing breast cancer from one not showing breast cancer when two such mammograms have been randomly selected and presented together. The area under the ROC curve includes high false-positive rates that are not relevant to screening (21,22). "pAz" is the partial area under the binormal ROC curve (2326) restricted to the interval in which false-positive probability is less than 10%. This measure can be interpreted as the average sensitivity for the diagnostician who reads within a clinically desirable range of false-positive values (23).

It is important to point out that sensitivity in our study refers to sensitivity in the context of screening. In screening, the central decision is whether to conduct additional work-up (i.e., the callback decision, which could include a recommendation for another mammogram after a short interval). It is not the goal of screening interpretation to provide a definitive diagnosis or to recommend biopsy without further consideration. Thus, a true-positive result in screening occurs whenever a woman with breast cancer is given a callback recommendation, and this determination is made without reference to correct localization of the cancer by the radiologist. Hence, our measures of skill refer to the skill of the radiologist to detect cancer in the screening mammogram but not to then localize and correctly identify it. Such skills pertain to the diagnostic interpretation of mammograms and are distinct from skill in screening.

After controlling for the possible influence of other factors, the influence of volume on accuracy was assessed with bootstrapped, multiple-regression analysis for ROC curves (27). This analysis also allowed us to investigate the independent association of the other factors with accuracy. The factors we tested are listed in Table 4Go. The confidence intervals (CIs) that we used to assess associations are 95% biased-corrected and accelerated CIs (28), as implemented by S-Plus 2000 (Mathsoft, Seattle, WA).


View this table:
[in this window]
[in a new window]
 
Table 4. Regression analysis
 
Sample sizes of mammograms and physicians were determined to provide 90% power to detect an increase of 5% in the mean sensitivity of a radiologist associated with an increased annual reading volume of 480 mammograms, i.e., the minimum required by the Mammography Quality Standards Act during the period of this study. The previous increase was also subject to the condition that the radiologist’s specificity (fixed at 90%) would not also decrease.

Finally, it is important to point out that the study was designed specifically to evaluate the statistical significance of the relationship between reading volume and accuracy, after controlling for the influence of other concomitant variables; that is, the study had a single hypothesis. Because we consider the estimation and statistical significance testing of the other independent variables to be purely exploratory, we have not exercised statistical control for multiple testing and/or estimation. We consider that the purpose of that portion of our study was to raise hypotheses to be tested in subsequent research. Nonetheless, approximate Bonferroni-adjusted (29) CIs, which provide composite 95% coverage, are reported as a guide to interpretation by the reader. All statistical tests were two-sided.


    RESULTS
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
One hundred ten radiologists interpreted screening mammograms from the same 148 women. The performance characteristics of the 110 participating radiologists are presented in Table 3Go. Sensitivity refers to the percentage of women with breast cancer that were given a recommendation for further work-up. Specificity refers to the percentage of women without breast cancer that were not recommended for further work-up. These data affirm that the radiologist sample is representative of the U.S. population because the values for this sample are similar to other published values (12) and to those of another independently conducted national survey of radiologist skill in mammography (16). For example, the U.S. national survey conducted by Beam et al. (16) observed that sensitivity ranged from 47% to 100% and that specificity ranged from 36% to 99%. These values are very similar to those reported in Table 3Go. Sensitivity in the present sample of radiologists ranged from 59% to 100%, and specificity ranged from 35% to 98%.


View this table:
[in this window]
[in a new window]
 
Table 3. Interpretive performance characteristics of the 110 sampled radiologists
 
Fig. 1Go provides a graphical depiction of the range of reader performance with the model used in the computation of the accuracy measure pAz. Radiologist-specific sensitivities at various values of false-positive probability were computed by use of the binormal model (19,27) and then plotted as a typical ROC curve. However, because there are so many values of sensitivity to plot at each value of false-positive probability, the graphical device known as the boxplot was used to reduce congestion. The boxplot gives a concise graphic depiction of the distribution of a sample. The plot shows the sample minimum, maximum, and quartiles. For example, for false-positive probability values greater than 1% in Fig. 1Go, there is a boxplot of the values of sensitivity estimated in our sample of 110 radiologists. We see that, at 1% false-positive probability, the minimum sensitivity was about 10% and the maximum sensitivity was about 80%. The median sensitivity is estimated to be approximately 45%, with first quartile sensitivity at approximately 35% and third quartile sensitivity at approximately 60%.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 1. Population receiver operating characteristic (ROC) curve. Boxplots of sensitivity of 110 radiologists at various false-positive (Fp) probabilities. Sensitivity (Se) was estimated for each radiologist using the binormal model. Each boxplot shows minimum, first quartile, median, third quartile, and maximum sensitivity. A smooth line connects medians. Statistical outliers are identified by black dots. This figure previously appeared as Fig. 4Go in reference (30) and is published here with permission from the publisher, International Society for Optical Engineering (SPIE).

 
This "population ROC curve" (30) provides a sense of the range of performance across various operating levels of false-positive probability. The widest range—that is, the biggest disparity that is observed among radiologists—occurs near the more clinically relevant levels of false-positive probability (those near 1%). However, the frequency of outliers (that is, physicians whose performance is statistically much lower than their cohort) increases as false-positive probability increases.

A great deal of variation in accuracy among the 110 readers was observed, and any trend (indicated by the superimposed least squares line) was slight, as shown in Figs. 2 and 3GoGo, which present the two measures of accuracy as a function of reading volume. A 1% increase in accuracy requires increasing the annual reading volume by about 3000 mammograms for Am and by about 1200 mammograms for pAz, as indicated by the least squares regression line. The diffuse nature of this relationship is confirmed numerically by the fact that the linear relationship with volume accounts for only 2.42% of the variation observed in the accuracy measure Am and only 1.48% of the variation in the measure pAz.



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 2. Scatterplot of accuracy (measured by the area under the receiver operating characteristic [ROC] curve estimated nonparametrically [Am]) versus recent reading volume (100 mammograms per year). Recent reading volume is the total number of all mammograms read in the year before the study. Coefficients of the simple linear regression line are an intercept term (0.8946) estimating the average accuracy in the U.S. population of radiologists when reading volume is zero and a slope coefficient (0.0003423) estimating the increase in mean accuracy with each increase in reading of 100 mammograms. These coefficients differ from those reported in the tables because they do not adjust for the influence of other factors.

 


View larger version (11K):
[in this window]
[in a new window]
 
Fig. 3. Scatterplot of accuracy (measured by the partial area under the binormal receiver operating characteristic [ROC] curve restricted to the interval where false-positive probability is less than 10% [pAz]) versus recent reading volume (100 mammograms per year). Recent reading volume is the total number of all mammograms read in the year prior to the study. Coefficients of the simple linear regression line are an intercept term (0.6323) estimating the average accuracy in the U.S. population of radiologists when reading volume is zero and a slope coefficient (0.000797) estimating the increase in mean accuracy with each increase in reading of 100 mammograms. These coefficients differ from those reported in the tables because they do not adjust for the influence of other factors.

 
Results from the multiple regression analysis are presented in Table 4Go for factors related to individual radiologists and factors related to facilities. These regression parameter estimates reflect the association between each factor and accuracy, which is independent of the influence of the other factors considered in this analysis. After controlling for the influence of the other radiologist- and facility-related variables appearing in Table 4Go, neither the current reading volume (the number of mammograms read the year before the study) nor the numbers of years of reading mammograms, another variable reflective of the quantity of experience, were statistically significantly associated with accuracy. Our model estimated an increase of 0.01% (95% CI = -0.07% to 0.05%) in mean Am for every hundred mammograms read. However, there was a decrease of -0.01% (95% CI = -0.58% to 0.03%) estimated in pAz for each hundred mammograms read. The influence of an additional year reading mammograms was estimated to increase Am by 0.19% (95% CI = -0.05% to 0.44%) and pAz by 0.29% (95% CI = -0.97% to 0.90%).

Several of the other factors, however, were statistically significantly associated with both measures of accuracy. The number of years since residency was statistically significantly and negatively associated with both measures of accuracy. Having a formal rotation in mammography during residency was also negatively associated with both measures of accuracy. Our model estimated a decrease in the average Am of -0.30% (95% CI = -0.60% to -0.09%) and a decrease in the average pAz of -0.76% (95% CI = -1.75% to -0.02%) for each year after residency. In addition, the model estimated that radiologists who had a formal mammography rotation during residency had, on average, a decreased Am of –0.55% (95% CI = -2.75% to -0.00%) and a decreased pAz of -1.44% (95% CI = -10.66% to -2.32%) relative to those without a formal rotation.

Other factors were associated uniquely with each accuracy measure. Being an owner of the practice was statistically significantly associated with increased accuracy (i.e., Am = 0.59% [95% CI = 0.02% to 2.46%]). The presence of a computerized system to monitor and track screening was statistically significantly associated with decreased accuracy (i.e., Am = -0.60% [95% CI = -2.71% to -0.23%]). An increased number of diagnostic breast imaging examinations and image-guided breast interventional procedures performed at the facility was statistically significantly and positively associated with accuracy (i.e., Am = 0.55% [95% CI = 0.11% to 2.40%]). If the facility was in a hospital radiology department or multispecialty medical clinic, there was an associated decrease in accuracy (i.e., Am = -1.39% [95% CI = -3.82% to -0.15%]) compared with the accuracy expected when the facility was classified as a comprehensive breast diagnostic and/or screening center or freestanding mammography center.

Two variables were associated uniquely with pAz. The presence of double reading (the practice of having two radiologists interpret each screening mammogram) at a facility was associated with increased accuracy (i.e., pAz = 1.61% [95% CI = 1.99% to 11.65%). However, the presence of a formal pathology correlation conference (in which physicians jointly and retrospectively review the tissue pathology associated with mammographic findings that lead to biopsy) was statistically significantly associated with a decrease in mean accuracy (mean pAz = -5.46% [95% CI = -15.18% to -3.21%]).


    DISCUSSION
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Although volume might be a determinant of expertise and quality of care in breast surgery, we did not observe that the number of mammograms read in the last year (current reading volume), on its own, was statistically significantly associated with accuracy in the detection of breast cancer from screening mammograms. This is not to say that our study proves conclusively that no such relationship exists. Larger sample sizes might yield statistically significant results. Even so, the magnitude of the size of the association between volume and accuracy might not be clinically significant. Our sample size was selected to provide excellent statistical power (90%) to detect a relationship of the magnitude (5% increase in mean sensitivity [pAz] with no decrease in specificity) that would support current volume requirements by the Mammography Quality Standards Act (at the time this study was planned, those requirements were 480 mammograms per year). The effect estimated by our study was much smaller than that.

Our study found that radiologists trained more recently, on average, interpret screening mammograms statistically significantly more accurately. The effect appears substantial: our models estimate a mean 0.3% reduction in Am and a 0.76% reduction in pAz for every 1 year after residency. We now explore various mechanisms by which this observation could have occurred.

Selection Bias

One way that our finding could have occurred is if a large percentage of radiologists with a larger number of years since residency and very high accuracy elected not to participate in our study. Because we did not, of course, measure the accuracy of those who did not participate, we cannot assess this issue directly from our data. We can, however, assess whether readers with a larger number of years since residency differentially selected to participate. This differential selection is a necessary condition for the sort of selection bias that could lead to an erroneous finding of a declining relationship. Logistic regression, however, suggests that individuals with a larger number of years since residency were more likely to be willing to participate. Yet this association was not statistically significant (P = .074; odds ratio estimate = 1.023, 95% CI = 0.998 to 1.050). We therefore conclude that there is no evidence from our study to support selection bias as a cause of this finding.

Confounding

A confounder is an independent variable that is correlated with the dependent variable and with the primary independent variable (31). It is important to identify confounding because failure to do so can lead to erroneous conclusions about the true relationship between the primary independent variable and the dependent variable.

There may be variables not considered by our analysis that confound the apparent relationship between accuracy and the number of years since residency. One possibility is that the number of years since residency is a surrogate for some factor, such as perceptual acuity, that might be related to physician age. Another possibility is that differences in types and quality of training, which naturally evolve in residency programs across time, are the operative factors behind the negative association found between accuracy and years since residency. Our study cannot rule out confounding, and further research is needed to assess the role of confounders in the apparent relationship between accuracy and the number of years since residency.

Failure of Skill Maintenance or Improvement Mechanisms

Screening mammogram interpretation is a skill that must be maintained. Declining accuracy with an increased number of years since residency could come about because of the failure of radiologists to maintain skill level against an incipient tendency of skill loss often found in human activities. Another possibility is related to the potential absence of effective methods for skill improvement during the course of a professional career coupled with initial disparities in the quality of training. Our study was not able to discern the nature of the relationship between accuracy and number of years since residency. Further research is needed to explore pathways of the relationship that we have established.

Other factors were found to be statistically significant. Some (such as double reading) seem to support the quality of feedback hypothesis. Others (such as having a rotation in mammography) are nonintuitive. As demonstrated in Figs. 4 and 5GoGo, nonintuitive factors could have come about in our study because we have adjusted for other variables. In addition, caution should be used when interpreting such findings because some of the subgroups are small, with a sample size of only three, and caution should be used because testing so many hypotheses increases the likelihood of false-positive findings. Further research is needed to better understand the true effect of these factors and the influence of other confounding factors.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 4. Negative effect of the use of correlation conference at a facility on accuracy. Mean accuracies (the partial area under the binormal receiver operating characteristic [ROC] curve restricted to the interval where false-positive probability is less than 10% [pAz]) of subgroups defined by practice setting and use of correlation conference at the facility are plotted as follows: diamonds (solid line) = hospital radiology department or multispecialty clinic; triangles (light dashed line) = breast center or freestanding center; squares (heavy dashed line) = private radiology practice. The correlation conference appears to confer a benefit when conducted by a specialized breast center and otherwise not. Confidence intervals have not been indicated because of the small sample size (n = 3) of some groups.

 


View larger version (8K):
[in this window]
[in a new window]
 
Fig. 5. Negative effect on accuracy of having a formal rotation in mammography. Boxplots of accuracy (the area under the receiver operating characteristic [ROC] curve estimated nonparametrically [Am]) are presented for subgroups defined by whether the radiologist had a formal rotation in mammography during residency (0 = no such rotation; 1 = had such a rotation). Each boxplot presents the data minimum (i.e., lowest horizontal bar), the first quartile (i.e., lower edge of the box), median (i.e., bar inside the box), third quartile (i.e., upper edge of the box), and data maximum (i.e., highest bar). Circles = statistically extreme observations. A) Data before the influence of any other variable is considered. This panel provides a conclusion that fits well with intuition: the median accuracy of radiologists who had a rotation is greater than those who did not have a rotation. B) Number of years since residency is considered in boxplots of accuracy within each quartile of the number of years since residency (coded in the upper panels as follows: a = 24–33 years; b = 16–23 years; c = 8–15 years; d = 1–7 years). The expected benefit from having had a rotation appears only in the group of radiologists who are 24–33 years from their residency. In the other three quartiles, the effect falls in the opposite direction, with median accuracy lower in the group having had a rotation. This result could reflect the influence of a confounding variable associated with recent changes in education.

 
Our study has several limitations. Interpretation accuracy was measured in a reading study. Our survey of radiologist characteristics relied on self-reported data that were not independently verified. This problem could lead to the attenuation of effect size estimates because of the presence of measurement error in independent variables. To explore the possible impact of error in the self-reporting of volume, we replicated our analyses with volume recoded to express increments of 500 (i.e., 0 = 0 to 500, 1 = 501–1000, etc.) (data not shown). We obtained similar findings from this analysis, and so we believe that the impact of self-reporting error was minimal.

We conclude that the phenomenon of expertise in mammography reflects a complex multifactorial process that needs to be better understood. We believe that scientific research and health policy recommendations aimed at improving the quality of interpretation that considers only radiologist volume will likely be misleading and ineffectual.


    NOTES
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Supported by Public Health Service grant CA74110 from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services.


    REFERENCES
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 

1 Hughes RG, Hunt SS, Luft HS. Effects of surgeon volume and hospital volume on quality of care in hospitals. Med Care 1987;25:489– 503.[CrossRef][Web of Science][Medline]

2 McArdle CS, Hole D. Impact of variability among surgeons on postoperative morbidity and mortality and ultimate survival. BMJ 1991;302:1501–5.[Abstract/Free Full Text]

3 Romano PS, Mark DH. Patient and hospital characteristics related to in-hospital mortality after lung cancer resection. Chest 1992;101: 1332–7.[Abstract/Free Full Text]

4 Begg CB, Cramer LD, Hoskins WJ, Brennan MF. Impact of hospital volume on operative mortality for major cancer surgery. JAMA 1998;280:1747–51.[Abstract/Free Full Text]

5 Begg CB, Riedel E, Bach P, Kattan MW, Schrag D, Warren JL, et al. Variations in morbidity after radical prostatectomy. New Engl J Med 2002;346:1138–44.[Abstract/Free Full Text]

6 Birkmeyer JD, Siewers AE, Finlayson EV, Stukel TA, Lucas FL, Batista I, et al. Hospital volume and surgical mortality in the United States. New Engl J Med 2002;346:1128–37.[Abstract/Free Full Text]

7 Sainsbury R, Haward B, Rider L, Johnston C, Round C. Influence of clinician workload and patterns of treatment on survival from breast cancer. Lancet 1995;345:1265–70.[CrossRef][Web of Science][Medline]

8 Gillis CR, Hole DJ. Survival outcomes of care by specialist surgeons in breast cancer: a study of 3786 patients in the west of Scotland. BMJ 1996; 312:145–53.[Abstract/Free Full Text]

9 Ma M, Bell J, Campbell S, Basnett I, Pollack A, Taylor I. Breast cancer management: is volume related to quality? Br J Cancer 1997;75:1652–9.[Web of Science][Medline]

10 Nodine CF, Kundel HL, Lauver SC, Toto LC. Nature of expertise in searching mammograms for breast masses. Acad Radiol 1996;3:1000–6.[CrossRef][Web of Science][Medline]

11 Elmore JG, Wells CK, Howard DH. Does diagnostic accuracy in mammography depend on radiologist’s experience? J Womens Health 1998;7:443–9.[Web of Science][Medline]

12 Nodine CF, Kundel HL, Mello-Thoms C, Weinstein SP, Orel SG, Sullivan DC, et al. How experience and training influence mammography expertise. Acad Radiol 1999;6:575–85.[CrossRef][Web of Science][Medline]

13 Kan L, Olivotto IA, Sickles EA, Coldman AJ. Standardized abnormal interpretation and cancer detection ratios to assess reading volume and reader performance in a breast screening program. Radiology 2000;215:563–7.[Abstract/Free Full Text]

14 Esserman L, Cowley H, Eberle C, Kirkpatrick A, Chang S, Berbaum K, et al. Improving the accuracy of mammography: volume and outcome relationships. J Natl Cancer Inst 2002;94:369–75.[Abstract/Free Full Text]

15 McKee MD, Cropp DM, Hyland A, Watroba N, McKinley B, Edge SB. Provider case volume and outcome in the evaluation and treatment of patients with mammogram-detected breast carcinoma. Cancer 2002;95:704–12.[CrossRef][Web of Science][Medline]

16 Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of screening mammograms by US radiologists. Arch Intern Med 1996;156:209–13.[Abstract/Free Full Text]

17 Egglin TK, Feinstein AR. Context bias. A problem in diagnostic radiology. JAMA 1996;276:1752–5.[Abstract/Free Full Text]

18 American College of Radiology (ACR). Illustrated breast imaging reporting and data system (BI-RADSTM). 3rd ed. Reston (VA): American College of Radiology; 1998.

19 Metz CE. ROC methodology in radiologic imaging. Invest Radiol 1986;21:720–33.[Web of Science][Medline]

20 Hanley JA, McNeil BJ. The meaning and use of the area under an ROC curve. Radiology 1982;143:29–35.[Abstract/Free Full Text]

21 Halpern EJ, Albert M, Krieger AM, Metz CE, Maidment AD. Comparison of receiver operating characteristic curves on the basis of optimal operating points. Acad Radiol 1996;3:245–53.[CrossRef][Web of Science][Medline]

22 Thompson ML, Zucchini W. On the statistical analysis of ROC curves. Stat Med 1989;8:1277–90.[Web of Science][Medline]

23 Wieand S, Gail MH, James KL, James BR. A family of nonparametric statistics for comparing diagnostic tests with paired or unpaired data. Biometrika 1989;76:585–92.[Abstract/Free Full Text]

24 McClish DK. Analyzing a portion of the ROC curve. Med Decis Making 1989;9:190–5.[Abstract/Free Full Text]

25 Jiang Y, Metz CE, Nishikawa RM. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 1996;201: 745–50.[Abstract/Free Full Text]

26 Hanley JA. The use of the binormal model for parametric ROC analysis of quantitative diagnostic tests. Stat Med 1996;15:1575–85.[CrossRef][Web of Science][Medline]

27 Beam CA. A two-stage ROC regression model when sampling a population of diagnosticians. In: Chakraborty DP, Krupinski EA, editors. Medical imaging 2002: image perception, observer performance, and technology assessment. Proc SPIE 2002;4684:236–47.

28 Efron B, Tibshirani RJ. An introduction to the bootstrap. New York (NY): Chapman & Hall; 1993. p. 178–88.

29 Snedecor GW, Cochran WG. Statistical methods. 7th ed. Ames (IA): The Iowa State University Press; 1980. p. 166–7.

30 Beam CA. Reader strategies: variability and error-methodology, findings and health policy implications from a study of the US population of mammographers. In: Chakraborty DP, Krupinski EA, editors. Medical imaging 2002: image perception, observer performance, and technology assessment. Proc SPIE 2002;4686:157–68.[CrossRef]

31 Hosmer DW, Lemeshow S. Applied logistic regression. New York (NY): John Wiley & Sons, Inc.; 1989. p. 63.

Manuscript received June 13, 2002; revised November 26, 2002; accepted January 3, 2003.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
RadiologyHome page
D. Gur, A. I. Bandos, C. S. Cohen, C. M. Hakim, L. A. Hardesty, M. A. Ganott, R. L. Perrin, W. R. Poller, R. Shah, J. H. Sumkin, et al.
The "Laboratory" Effect: Comparing Radiologists' Performance and Variability during Prospective Clinical and Laboratory Mammography Interpretations
Radiology, October 1, 2008; 249(1): 47 - 53.
[Abstract] [Full Text] [PDF]


Home page
Phil Trans R Soc AHome page
S.-Y. Shiu and C. Gatsonis
The predictive receiver operating characteristic curve for the joint assessment of the positive and negative predictive values
Phil Trans R Soc A, July 13, 2008; 366(1874): 2313 - 2333.
[Abstract] [Full Text] [PDF]


Home page
BrainHome page
R. P. Lesser, H. W. Lee, W. R. S. Webber, B. Prince, N. E. Crone, and D. L. Miglioretti
Short-term variations in response distribution to cortical stimulation
Brain, June 1, 2008; 131(6): 1528 - 1539.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
E. J. A. Bowles, D. L. Miglioretti, E. A. Sickles, L. Abraham, P. A. Carney, B. C. Yankaskas, and J. G. Elmore
Accuracy of Short-Interval Follow-Up Mammograms by Patient and Radiologist Characteristics
Am. J. Roentgenol., May 1, 2008; 190(5): 1200 - 1208.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
C. Kuhl
The Current Status of Breast MR Imaging * Part I. Choice of Technique, Image Interpretation, Diagnostic Accuracy, and Transfer to Clinical Practice
Radiology, August 1, 2007; 244(2): 356 - 378.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
R. S. Lewis, M. Bhargavan, and J. H. Sunshine
Women Radiologists in the United States: Results from the American College of Radiology's 2003 Survey
Radiology, March 1, 2007; 242(3): 802 - 810.
[Abstract] [Full Text] [PDF]


Home page
Am J EpidemiolHome page
D. L. Miglioretti and P. J. Heagerty
Marginal Modeling of Nonnested Multilevel Data using Standard Software
Am. J. Epidemiol., February 15, 2007; 165(4): 453 - 463.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
J. W. T. Leung, F. R. Margolin, K. E. Dee, R. P. Jacobs, S. R. Denny, and John. D. Schrumpf
Performance Parameters for Screening and Diagnostic Mammography in a Community Practice: Are There Differences Between Specialists and General Radiologists?
Am. J. Roentgenol., January 1, 2007; 188(1): 236 - 241.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
R. L. Barclay, J. J. Vicari, A. S. Doughty, J. F. Johanson, and R. L. Greenlaw
Colonoscopic Withdrawal Times and Adenoma Detection during Screening Colonoscopy
N. Engl. J. Med., December 14, 2006; 355(24): 2533 - 2541.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
S. H. Taplin, C. M. Rutter, and C. D. Lehman
Testing the effect of computer-assisted detection on interpretive performance in screening mammography.
Am. J. Roentgenol., December 1, 2006; 187(6): 1475 - 1482.
[Abstract] [Full Text] [PDF]


Home page
J. Epidemiol. Community HealthHome page
X. Castells, E. Molins, and F. Macia
Cumulative false positive recall rate and association with participant related factors in a population based breast cancer screening programme.
J Epidemiol Community Health, April 1, 2006; 60(4): 316 - 321.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
M. A. Ganott, J. H. Sumkin, J. L. King, A. H. Klym, V. J. Catullo, C. S. Cohen, and D. Gur
Screening Mammography: Do Women Prefer a Higher Recall Rate Given the Possibility of Earlier Detection of Cancer?
Radiology, March 1, 2006; 238(3): 793 - 800.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
A. J. Coldman, D. Major, G. P. Doyle, Y. D'yachkova, N. Phillips, J. Onysko, R. Shumak, N. E. Smith, and N. Wadden
Organized Breast Screening Programs in Canada: Effect of Radiologist Reading Volumes on Outcomes
Radiology, March 1, 2006; 238(3): 809 - 815.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
C. A. Beam, E. F. Conant, and E. A. Sickles
Correlation of Radiologist Rank as a Measure of Skill in Screening and Diagnostic Interpretation of Mammograms
Radiology, February 1, 2006; 238(2): 446 - 453.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
E. S. Burnside, J. M. Park, J. P. Fine, and G. A. Sisney
The Use of Batch Reading to Improve the Performance of Screening Mammography
Am. J. Roentgenol., September 1, 2005; 185(3): 790 - 796.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
J. G. Elmore, L. M. Reisch, M. B. Barton, W. E. Barlow, S. Rolnick, E. L. Harris, L. J. Herrinton, A. M. Geiger, R. K. Beverly, G. Hart, et al.
Efficacy of Breast Cancer Screening in the Community According to Risk Level
J Natl Cancer Inst, July 20, 2005; 97(14): 1035 - 1043.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
E. A. Sickles, D. L. Miglioretti, R. Ballard-Barbash, B. M. Geller, J. W. T. Leung, R. D. Rosenberg, R. Smith-Bindman, and B. C. Yankaskas
Performance Benchmarks for Diagnostic Mammography
Radiology, June 1, 2005; 235(3): 775 - 790.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
D. Gur, L. P. Wallace, A. H. Klym, L. A. Hardesty, G. S. Abrams, R. Shah, and J. H. Sumkin
Trends in Recall, Biopsy, and Positive Biopsy Rates for Screening Mammography in an Academic Practice
Radiology, May 1, 2005; 235(2): 396 - 401.
[Abstract] [Full Text] [PDF]


Home page
JAMAHome page
J. G. Elmore, K. Armstrong, C. D. Lehman, and S. W. Fletcher
Screening for Breast Cancer
JAMA, March 9, 2005; 293(10): 1245 - 1256.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
R. Smith-Bindman, P. Chu, D. L. Miglioretti, C. Quale, R. D. Rosenberg, G. Cutter, B. Geller, P. Bacchetti, E. A. Sickles, and K. Kerlikowske
Physician Predictors of Mammographic Accuracy
J Natl Cancer Inst, March 2, 2005; 97(5): 358 - 367.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
Y. Shen and G. Parmigiani
A Model-Based Comparison of Breast Cancer Screening Strategies: Mammograms and Clinical Breast Examinations
Cancer Epidemiol. Biomarkers Prev., February 1, 2005; 14(2): 529 - 532.
[Abstract] [Full Text] [PDF]


Home page
CMAJHome page
I. Theberge, N. Hebert-Croteau, A. Langlois, D. Major, and J. Brisson
Volume of screening mammography and performance in the Quebec population-based Breast Cancer Screening Program
Can. Med. Assoc. J., January 18, 2005; 172(2): 195 - 199.
[Abstract] [Full Text] [PDF]


Home page
CMAJHome page
J.-L. Urbain
Breast cancer screening, diagnostic accuracy and health care policies
Can. Med. Assoc. J., January 18, 2005; 172(2): 210 - 211.
[Full Text] [PDF]


Home page
Mayo Clin Proc.Home page
D. J. Rhodes, M. K. O'Connor, S. W. Phillips, R. L. Smith, and D. A. Collins
Molecular Breast Imaging: A New Technique Using Technetium Tc 99m Scintimammography to Detect Small Tumors of the Breast
Mayo Clin. Proc., January 1, 2005; 80(1): 24 - 30.
[Abstract] [PDF]


Home page
JNCI J Natl Cancer InstHome page
W. E. Barlow, C. Chi, P. A. Carney, S. H. Taplin, C. D'Orsi, G. Cutter, R. E. Hendrick, and J. G. Elmore
Accuracy of Screening Mammography Interpretation by Characteristics of Radiologists
J Natl Cancer Inst, December 15, 2004; 96(24): 1840 - 1850.
[Abstract] [Full Text] [PDF]


Home page
Med Decis MakingHome page
R. F. Wagner, C. A. Beam, and S. V. Beiden
Reader Variability in Mammography and Its Implications for Expected Utility over the Population of Readers and Cases
Med Decis Making, November 1, 2004; 24(6): 561 - 572.
[Abstract] [PDF]


Home page
RadiologyHome page
L. E. M. Duijm, J. H. Groenewoud, J. H. C. L. Hendriks, and H. J. de Koning
Independent Double Reading of Screening Mammograms in the Netherlands: Effect of Arbitration Following Reader Disagreements
Radiology, May 1, 2004; 231(2): 564 - 570.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
D. Gur, J. H. Sumkin, H. E. Rockette, M. Ganott, C. Hakim, L. Hardesty, W. R. Poller, R. Shah, and L. Wallace
Changes in Breast Cancer Detection and Mammography Recall Rates After the Introduction of a Computer-Aided Detection System
J Natl Cancer Inst, February 4, 2004; 96(3): 185 - 190.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
C. A. Beam, E. F. Conant, E. A. Sickles, and S. P. Weinstein
Evaluation of Proscriptive Health Care Policy Implementation in Screening Mammography
Radiology, November 1, 2003; 229(2): 534 - 540.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
J. G. Elmore, C. Y. Nakano, T. D. Koepsell, L. M. Desnick, C. J. D'Orsi, and D. F. Ransohoff
International Variation in Screening Mammography Interpretations in Community-Based Programs
J Natl Cancer Inst, September 17, 2003; 95(18): 1384 - 1393.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
D. B. Kopans
Re: Association of Volume and Volume-Independent Factors With Accuracy in Screening Mammogram Interpretation
J Natl Cancer Inst, May 21, 2003; 95(10): 758 - 759.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Beam, C. A.
Right arrow Articles by Sickles, E. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Beam, C. A.
Right arrow Articles by Sickles, E. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?