The Canadian National Breast Screening Study: A Perspective on Criticisms
- From the Central Office, National Breast Screening Study at the University of Toronto, Ontario. Requests for Reprints: Cornelia J. Baines, MD, 12 Queen's Park Crescent West, 3rd Floor, Toronto, Ontario, Canada M5S 1A8. Acknowledgments: The author thanks Dr. Howard Seiden, who identified the origin of the claims that cancer detection was delayed 2 to 5 years, Dr. Andrew D. Baines, who designed Table 7 and who, along with Professor Gail McKeown-Eyssen, read and improved many drafts. Grant Support: By the Canadian Cancer Society, Health and Welfare Canada, Heritage Fund Alberta, Manitoba Health Services Commission, Medical Research Council of Canada, Le Minister des Affaires Sociales du Quebec, National Cancer Institute of Canada, Nova Scotia Department of Health, and the Ontario Ministry of Health. The author received partial salary support from the National Cancer Institute of Canada.
Abstract
Recently published 7-year results from the Canadian National Breast Screening Study (NBSS) generated much controversy and criticism. In women aged 40 to 49 years at entry, no reduction in breast cancer mortality was observed when screened women were compared with virtually unscreened women. In women aged 50 to 59 years, breast cancer mortality was similar when annual screening with mammography and physical examination was compared with annual screening with physical examination alone. Although NBSS results in 40- to 49-year-old women are similar to those from previously published screening studies, critics have attacked the study's design, randomization, execution, mammography, follow-up procedures, contamination of controls, and analysis. The absence of benefit observed in mammographically screened women who were 50 to 59 years old has been used to support criticism of mammography. Important facts have been ignored. The NBSS controls, aged 50 to 59 years, unlike in other studies, received thorough annual physical examinations. Cancer detection rates in both age groups were higher in the mammography than the comparison groups. Screen and interval cancer detection rates, sensitivity and specificity estimates, and prevalence to incidence ratios at first screen met or exceeded standards established by other screening studies. Claims that randomization was flawed, in particular, that more symptomatic women were assigned to mammography, are not supported by the distribution of descriptive variables collected before randomization was done. As for the “contamination” of 26% of controls aged 40 to 49 years, who reported receiving mammography, it is improbable that single or occasional diagnostic mammograms in one quarter of the control group could obliterate the benefit of four or five annual mammograms in almost 100% of the mammography group. Much remains unknown about the efficacy of breast screening.
It is the responsibility of the scientist to balance passion with dispassion.
John C. Polanyi
“The Responsibility of the Scientist”
Department of Physiology Lecture Series:
Frontiers in Physiology and Pharmacology,
University of Toronto, 22 M
Because no previous study evaluating the benefit of screening for breast cancer had specifically addressed efficacy in women aged 40 to 49 years, the results of the Canadian National Breast Screening Study (NBSS) were impatiently awaited. When they were finally published in November 1992 [1, 2], the results showed no reduction in breast cancer mortality after 7 years in screened women aged 40 to 49 years. Further, mammography did not achieve an incremental mortality benefit over and above clinical examination in women 50 to 59 years old, although it did achieve higher rates of cancer detection.
What one wants to believe is easy to believe. Results from the NBSS, however, were difficult for many to believe. Given the passionately held belief that early detection of breast cancer achieved by mammography screening benefits women aged 40 to 49 years, it was not surprising that the NBSS became the focus of much attention. Some of the criticisms that have been generated are addressed in this article in the hope that a better understanding of the issues involved will be reached.
Methods
Planned in the late 1970s, the NBSS began screening women in January 1980. Its design [3] addressed the unanswered questions prevailing at the time. The New York Health Insurance Plan breast screening trial had shown that combined annual screening with mammography and physical examination reduced breast cancer mortality in women 50 years of age and over when compared with no screening [4]. Although the separate contributions of mammography and physical examination of the breasts to mortality reduction were not known, it was believed that clinical examination of the breasts made a major contribution. Also not known in 1980 was whether combined annual screening reduced breast cancer mortality in women aged 40 to 49 years when compared with no screening intervention.
To determine the benefit of combined screening in women aged 40 to 49 years, the NBSS recruited 50 430 women individually randomly assigned so that 25 214 received annual mammography and physical examination and 25 216 received a single physical examination, thereafter returning to usual community care with annual follow-up by mailed questionnaire. To determine the separate contribution of mammography to breast cancer mortality reduction, 39 405 women ages 50 to 59 years were recruited, of whom 19 711 were randomly assigned to receive annual mammography and physical examination of the breasts and 19 694 to receive an annual physical examination alone.
Entry criteria included meeting the age criteria, not being pregnant, not having been diagnosed with breast cancer, not having had a mammogram in the 12 months before entry, and signing informed consent forms. Women were instructed in breast self-examination at entry and those eligible for re-screening received repeated instruction and evaluation. All submitted annual questionnaires that tracked breast procedures generated outside the NBSS.
Screening Sites
Each of the 15 centers across Canada had an adequate population base from which to draw participants and had recognized expertise in breast cancer. Six centers were in provincial cancer institutions (all in the forefront of cancer research and treatment) and eight in university teaching hospitals. Recruitment extended over 6 years, from 1980 to 1985.
Randomization
Women who met the entry criteria and completed two questionnaires (yielding identifying and demographic data including risk factors for breast cancer) then signed informed consent forms in the examining room. The screen-examiner asked if the participant had breast symptoms (lump, pain, discharge) and recorded the responses. A physical examination of the breasts and instruction in breast self-examination followed, after which the examiner decided if the clinical findings required the participant's referral to the NBSS surgical review clinic (usually held within 1 week). This decision was documented on the examiner's form, and the woman was informed.
The examiner then left the participant and approached the center coordinator or her deputy, who carried out the randomization procedure. Randomization lists, contained in four separate books that each included one quinquennium (40 to 44 years, 45 to 49 years, and so on), were provided to all centers. On learning the participant's age, the coordinator chose the appropriate book and entered the date and name on the first available line, thus assigning to the participant her identification number and allocation. The coordinator entered the number and allocation on all chart forms, and the examiner told the participant if she was to have mammography. From the examiner's perspective, it was not important to obtain a mammography allocation if a breast lump had been found because the participant would already be referred for surgical review. From the coordinator's perspective, skipping a line to achieve a desired allocation was not feasible because she could not predict when the next appropriately aged woman would arrive to fill the skipped slot. The coordinators were well educated, well trained, and responsible, and recognized the importance of the randomization procedure. At each month's end, the original randomization sheets were submitted to the central coordinating office where each sheet was examined for suspicious entries, inappropriate dates, and lack of congruence with participant records.
Mammography
Mammography was done on site and interpreted by study radiologists. Details of the equipment used, the audit procedures, and sensitivity and specificity estimates have been previously published [5-7].
Clinical Examination of the Breasts
Breast examination and breast self-examination instruction and evaluation were done by locally trained nurse-examiners in all centers outside the province of Quebec and by physicians in Quebec. The duration of the clinical examination ranged from 5 to 15 minutes, depending on the size of the breasts and on the amount of verbal interaction required. Descriptions of the role of the nurse-examiner, sensitivity and specificity estimates for breast examination, and breast self-examination behavior have all been previously published [8-10].
Surgical Review
Study surgeons appointed at each center to examine women with abnormalities detected by either or both screening interventions decided if diagnostic follow-up was required. If so, they forwarded their recommendations to the woman's physician, who then determined if and where the procedure would be done. Most recommended procedures were done [1, 2]. In women aged 40 to 49 years, diagnostic mammography was done as a consequence of the screening examination in 0.8% and 1.5% of the mammography and comparison groups, respectively.
Pathology Review
Reference pathologists appointed for each center reviewed slides from all surgical procedures done on participants from their center, whether or not the procedure resulted from NBSS recommendations.
Follow-up Procedures
During the 3 or 4 years each participant was in the study (participants entering in 1984 and 1985 were eligible for only a 3-year and four-screen schedule), the centers' routine follow-up procedures ensured high compliance with screening, and questionnaire information was obtained for most of the noncompliant participants [1, 2]. These procedures also achieved almost complete ascertainment of cases of breast cancer and deaths, shown by subsequent record linkage. After participants' screening schedules ended, active follow-up continued only for participants with known breast cancer. All such patients have always been and continue to be followed annually through contact with their physicians, whether their breast cancers were detected as a result of screening, during the screening schedule but not as a result of screening, or through record linkage during or after screening. Thus, the vital status of all women identified as having breast cancer is ascertained within, at most, 1 year of death.
Passive follow-up to identify new cases of breast cancer through linkage with cancer registries across Canada ascertained breast cancer diagnoses occurring after completion of screening schedules and in the very small proportion of women (1% of women from 40 to 49 years) who dropped out of the study before completing their schedule. In a few cases, breast cancer was first identified through the National Mortality Database as a cause of, or contributing to, death. In such cases, surgical, pathology, and other clinical records were reviewed retrospectively. Figure 1 shows the follow-up procedures.
Results
Seven years after entry, breast screening had no effect on breast cancer deaths when women aged 40 to 49 years receiving annual screening with mammography and physical examination (38 deaths) were compared with women receiving only a single physical examination of the breasts at entry into the study followed by usual community care (28 deaths). The numeric difference is not statistically significant. Further, no effect on breast cancer mortality was observed in women aged 50 to 59 years at entry who received annual mammography and physical examination of the breasts (38 deaths) compared with women who received only annual physical examination of the breasts (39 deaths). In both age groups, mammography was associated with higher cancer detection rates [1, 2].
Controversy
When the results were published, controversy arose [11-13], some of which was defused at a recent International Workshop on Breast Cancer Screening where it was observed that NBSS results were consistent with findings from other screening studies [14]. Nevertheless, NBSS results were unwelcome in a milieu where the lay public believes that “early detection” of breast cancer does—rather than may—lead to cure, where the media have focused on the risk for breast cancer in the young [15], and where many health professionals erroneously believe that prolonged survival after early detection proves it is beneficial [16]. Selected criticisms directed at the NBSS Table 1 are discussed below; some have been published, and others were comments made at meetings.
Design and Execution
Although much criticized [17], the design of the NBSS was dependent on knowledge available in the late 1970s and was influenced by the ethical constraints associated with randomized controlled trials, the then prevalent fear of radiation, the requirement for individually signed informed consent forms, the inconvenience to women who were asked to make a long-term commitment to a scientific study that offered interventions freely available to them outside the study, and medical professionals' self-perceived competence and their concerns about their autonomy. The NBSS was a pragmatic trial evaluating what would happen when women across Canada were offered screening in their own communities. The protocol (including entry and exclusion criteria, power calculations, and timing of randomization) developed in consultation with international experts [3] was approved by the Medical Research Council of Canada, the National Cancer Institute of Canada, the Canadian Cancer Society, Health and Welfare Canada, and institutions across Canada associated with the conduct or funding of the study. Nevertheless, the NBSS did not achieve its planned power [1]. Those who advocate screening of women aged 40 to 49 years are right when they say no study thus far has had adequate power for this age group. But it is also true that if the sample size must be hundreds of thousands to achieve adequate power, the effect sought must be small.
A policy advisory group, with international representation including oncologists, surgeons, radiologists, screening experts, and statisticians, monitored the execution of the NBSS. The execution of the study has been meticulously documented to an extent [5-10, 18-20] unmatched by other screening studies.
Randomization
Although demographic variables (place of birth, marital status, parity, educational level) were equally distributed across the two allocations [1, 2], critics [17] have challenged the effectiveness of randomization procedures in the NBSS, claiming that more symptomatic women aged 40 to 49 years were randomly assigned to the group receiving mammography than to the comparison group. Table 2 shows that this claim is unfounded. Crucial prerandomization variables are distributed equally across allocations, namely, the frequency of self-reported symptoms, a positive family history for breast cancer, and the referral rates to surgical review based on abnormalities found during physical examination.
Speculation has arisen that abnormal physical findings at the initial visit would induce the examiners or the center coordinators to assign preferentially such women to the mammography group. Table 2 also rules out this concern. In further response to claims that randomization was flawed, the original randomization sheets were re-examined to look for changes in script or pens used, crossing out of names, erasures, or problems with date sequences, with special attention given to the records of those who had died of breast cancer. No suspicious entries were found.
Excess Advanced Breast Cancer in the Mammography Group
The excess number of cases of advanced breast cancer observed in the mammography arm at screen 1 also raised doubts about the probity of randomization procedures, but the phenomenon is not unique to the NBSS. Excess cases of advanced cancer in screened groups aged 40 to 49 years were reported previously by three Swedish trials [21-23]. Among NBSS screen-1 cancers in the mammography group, there were 19 invasive breast cancers with four or more positive nodes and only 5 such cases in the comparison group [1]. Identical proportions of the two groups were referred to review clinics on the basis of abnormalities found on physical examination of the breasts. At the review clinics, study surgeons recommended diagnostic interventions on the basis of mammographic and physical findings in the mammography group and, for the most part, on the basis of physical findings only in the comparison group. Study surgeons consistently recommended more diagnostic interventions for the former than the latter. The availability of mammograms must have enhanced the likelihood of diagnosing not only early cancers but also node-positive cancers, not all of which were large.
Table 3 shows that advanced screen-1 cancers in women aged 40 to 49 years allocated to mammography were distributed among 10 centers with 77.2% of the NBSS population. No clustering in one or two centers occurred to support the speculation that randomization was subverted. By year 5, cumulative numbers of cases with four or more positive nodes were 39 and 22 in the mammography and comparison groups, respectively, a difference which, when expressed as a proportion of cases detected, 14.6% compared with 10.9%, becomes less dramatic [1]. Nevertheless, the imbalance triggered a review of all invasive breast cancers diagnosed in the first 5 years. When the mammography and control groups were compared, the mean numbers of nodes dissected were 11 and 10, the proportions of cases in whom no nodes were dissected were 5% and 10%, and the proportions of cases in whom fewer than four nodes were dissected were 10% and 14%, respectively. These comparisons indicate a potential for under-ascertainment of nodal involvement in the group not receiving mammography.
Future linkage with the National Mortality Database may reveal that in one or two provinces, some cases of advanced breast cancer had not yet been entered in the cancer registry database at the time NBSS linkages occurred. If true, the discrepancy in cases of advanced breast cancer, retrospectively ascertained, may be further reduced when 10-year results are available.
Excess Number of Deaths from Breast Cancer in the Mammography Group
The three Swedish studies [21-23] also documented an excess number of deaths from breast cancer in mammographically screened women aged 40 to 49 years. More recently, a meta-analysis of five Swedish studies has revised previously reported breast cancer mortality results [24]. Even with the meta-analysis, at 12-year follow-up, there was only a 13% reduction in breast cancer mortality (not statistically significant) in women 40 to 49 years old. Some NBSS findings that have provoked criticism, namely, excess cases of advanced cancer and an excess number of deaths from breast cancer in the mammography group, have also been observed in other studies.
Comparative Data on Sensitivity and Detection Rates
The most important defense of the much criticized NBSS mammography must rest with comparative sensitivity estimates and cancer detection rates. The 1993 Report of the International Workshop [14] compared the sensitivity of screening test performance for women aged 40 to 49 years in the NBSS, Stockholm, and Two-County trials. Sensitivity estimates (based on the detection method) were 81%, 53%, and 62%, respectively, whereas the incidence method yielded estimates of 58%, 39%, and 45% to 62%, respectively. As with sensitivity estimates, NBSS mammographic detection rates at the first screen and interval cancer detection rates compare favorably with the Swedish Two-County program Table 4 and Table 5. Further, in the NBSS, 60% of invasive cancers detected at screen 1 and 69% of those detected at screens 2 through 5 were node negative in the mammography group aged 40 to 49 years [1].
Delayed Cancer Detection
The inaccurate assertion that detection was delayed by 2 to 5 years in almost 20% of screen-detected cancers first appeared in an article in Radiologic Clinics of North America, in which an eminent U.S. mammographer sought to prove that NBSS mortality results (still unpublished at that time) should be disregarded because of delayed cancer detection caused by poor mammography [25]. Citing an NBSS publication [6], the author wrote: “More disturbing is that of the 575 screen-detected cancers, 100 were found by the reference radiologist on the mammograms 2 to 5 years prior to being detected. Twenty-eight cancers could have been found 2 years earlier, 33 cancers 3 years earlier, 27 cancers 4 years earlier and 12 cancers were shown on mammograms 5 years before clinical detection”. In contrast to this misinformation, the published data reveal that 28 screen-2 cancers might have been detected 1 year earlier, 33 screen-3 cancers might have been found 1 year earlier, 27 screen-4 cancers might have been found 1 year earlier, and 12 screen-5 cancers might have been found 1 year earlier. Yet, the mistaken belief that detection was delayed up to 5 years continues to be reinforced [26], although it has been documented that delays of 2 years occurred in less than 5% of interval cancers [6] and although no one could be in the study for more than 4 years.
That false negatives such as these always occur with mammography is generally acknowledged. In an editorial commenting on a review of mammograms from a diagnostic radiology service where most women were symptomatic, a U.S. mammographer wrote: “An important result of this review confirms another well known phenomenon, namely the failure of even expert observers to perceive all abnormalities. The radiologists reviewing the mammograms detected abnormalities apparently missed at the primary interpretation. One or both of the reviewers suggested a biopsy for 34 (54%) of 63 women (breast cancer cases) whose mammograms were originally read as negative” [27]. Analogous data from the NBSS would be that for 94 women with interval cancers, whose mammograms were originally read as negative, 36 (38%) were called positive by the NBSS reference radiologist [6]. Clearly, NBSS mammography stands up well in comparison. Parallel data on delayed detection are available from no other trial.
Interpretation of an External Review
A 1990 commentary on the “Canadian Screening Program” has been repeatedly used as evidence that NBSS mammography was unsatisfactory [28]. Criticizing an external technical review of NBSS mammography, the commentator, himself a coauthor [18], claimed that “almost 50% of mammograms during the first two years of screening were judged unsatisfactory,” and “that because of poor mammography, the results of this trial will always be suspect”. A letter documenting factual inaccuracies in the commentary is rarely cited [29]; the important issues are described below.
The technical review was based on a random sample of 835 mammographic examinations selected from more than 100 000 mammograms done between 1980 and 1987; the sample was not weighted by center recruitment. The purpose was to determine whether the technical quality of mammography improved over time. Although the external review was performed blindly in that the reviewers did not know the center of origin of any film, the woman's age, or the calendar year of the film, two of the three invited experts were aware that there were more deaths from breast cancer in screened than nonscreened women aged 40 to 49 years, knowledge which may have influenced their ratings. The evening before the review, the two reviewers said their participation was conditional on being allowed to rate all mammograms by 1988 standards: The 1988 standards required a mediolateral oblique view. This was unfortunate because between 1980 and 1984 the NBSS protocol [3] required two-view mammography, including straight mediolateral and craniocaudal positioning, a decision determined in consultation with U.S. and Canadian expert radiologists before the initiation of the NBSS in 1980. Ironically, the director of the NBSS urged at that time that the mediolateral oblique view, already being used in the Swedish trials, be used. The radiologic consultants insisted on the straight mediolateral view because it conformed to contemporary North American practice. In 1985, when the screen-1 examinations were completed, the Policy Advisory Group formally approved a change in positioning to mediolateral oblique, although at least one center had implemented it in 1983.
The planned scoring scheme for the technical assessment was 0 to 3 for poor, fair, satisfactory, and good, respectively. Four variables were to be scored: craniocaudal position; mediolateral position; contrast and density; and image quality. Table 6 shows that improvement did occur over time. Scores for the mediolateral view were much higher for mammograms done in 1986 and 1987 than for those done between 1980 and 1984. Scores in 1985 were intermediate because there were delays in implementing the oblique view in some centers. Craniocaudal views were rated as satisfactory or better in 92% to 98% of mammograms for all calendar years 1980 through 1987 (Table 6). A fifth variable imposed by the two U.S. radiologists, namely a global rating, correlated so closely with the mediolateral scores that it was a proxy for the oblique view [18].
What proportion of mammograms actually was satisfactory? To claim that in the first 2 years of the study, almost 50% of the mammograms were unacceptable is misleading [28]. Table 7 shows that only 4.9% of all NBSS mammograms performed between 1980 and 1988 were done in the study's first 2 years (1980 and 1981). The so-called unacceptable mammograms done in those 2 years make up only 2.1% of all NBSS mammography.
A now familiar criticism is that even by year 4 more than 50% of mammograms were unacceptable. Table 7 clearly refutes this point. The proportion rated poor or fair in any 1 year varies from 10% to 44% when the denominator is the number of mammograms done that year. In marked contrast, the proportion varies from 0.5% to 6.1% using the more appropriate denominator, the total number of mammograms done. Table 7 also shows how these proportions are further reduced (to 0.4% to 4.4%) if scores for the mediolateral view are removed for reasons already described. Even then, the proportions of unsatisfactory mammograms are inflated because the reviewers' scoring for image quality was influenced by their disapproval of the straight mediolateral view. The persuasiveness of the reviewers-turned-critics is weakened when one considers their poor intraobserver and interobserver agreement combined with the small sample size on which the criticism is based [18]. Nor can technical quality be linked to the previously discussed excess of advanced cancers at screen 1. When centers are categorized into tertiles by technical quality, no association with the distribution of advanced cancers is found [1]. Comparative data on technical quality are not available from any other screening trial.
Competence of Radiologists and Technologists
All study radiologists were specialists having met the requirements of the Royal College in Canada, and some were U.S. board certified. All had pre-NBSS experience in diagnosing breast cancer. In 1980, none could have participated in any organized screening program or attended continuing medical education programs in screening. To enhance the development of mammographic screening skills, all centers were routinely audited, with every study radiologist receiving frequent memoranda conveying case-specific comments on the technical quality and interpretation of the films [5, 6]. Annual meetings were held at which study radiologists were shown mammograms from all centers, and problems were discussed. Invited experts attended some meetings and provided examples of how better mammography could be achieved. Study radiologists received regular updates on NBSS performance indices and cancer detection so they could assess their center's performance compared with others. The registered technologists received feedback from the reference radiologist as well as from the reference physicist who conducted routine site visits to all centers and who regularly monitored the technical aspects of mammography [30].
It has been said of the mammography units that anything that was available was used. In fact, over 40% of the study population were screened in centers that had new mammography units when they opened, and a further 18% were screened with new units purchased after centers had opened. Other units had all been subject to inspection and evaluation by the reference physicist whose recommendations were implemented to the extent possible.
Lack of Mortality Reduction in Women Aged 50 to 59 Years at Entry
The virtually identical numbers of deaths from breast cancer in women who received annual mammography and physical examination compared with women who received annual physical examination alone prove neither that screening with mammography in this age group is ineffective nor that NBSS mammography was unsatisfactory. Other trials have compared screening with no screening and have shown that screening reduces breast cancer mortality. The NBSS compared screening using two modalities with screening using one modality. It is not surprising that if a mortality difference is to be observed, it will be smaller and slower to appear in the NBSS than in other screening trials. With the much higher cancer detection rates achieved in the group receiving mammography [2], NBSS results at 10 or 15 years may show fewer deaths from breast cancer in these women.
A passionate commitment to mammography has led critics to distort published data on NBSS false negatives and on the technical review, to disregard NBSS cancer detection rates, and to ignore the congruity between the NBSS results and those from other screening trials.
With respect to the latter point, Tabar and colleagues [23] have written that “earlier detection should result in a first screen prevalence of at least three times the expected incidence rate in the absence of screening”. In the same article, they report a first-screen prevalence to incidence ratio of 1.99 in the Swedish Two-County program in women aged 40 to 49 years. In the NBSS [1], the first-screen prevalence to incidence ratio for women aged 40 to 49 years in the mammography group for detection achieved by mammography and clinical examination was 3.25, and for mammography detection, 2.00. Numbers should be more persuasive than opinion.
Contamination of the Control Group
“Contamination” is said to have occurred because 26% of the control group aged 40 to 49 years reported receiving a mammogram. It was always expected that the control group would receive normal community care, meaning that symptomatic women would receive diagnostic mammography when clinically required. Indeed, it was the purpose of the study to determine whether intensive screening could improve outcomes compared with normal care. In fact, during their NBSS schedules, 14.5% (3651) of the control group had one examination, 7.8% (1968) had two examinations, and 4% (1036) had three or more. Single mammographic examinations are probably diagnostic. It is improbable that a single, or even a few, mammographic examinations over a 3- to 4-year period in 26% of the control group could obliterate any benefit achieved by annual mammography in almost 100% of the screened women.
Analysis
Dismissing women from the study who, having met the entry criteria, had abnormalities at screen 1 was suggested by a spokesperson for the American College of Radiology. The NBSS reference physicist suggested at a press conference that all screen-1 cancers should be omitted from the analysis. Were such suggestions followed, the selective removal of women at higher risk [31] and of cases of mammographically detected breast cancer would have rendered future comparisons between screened and control groups permanently biased. The documented success of NBSS randomization makes reasonable the assumption of internal validity; that is, it is valid to compare the two groups.
Concern has also been expressed that NBSS results were published prematurely. However, most screening studies have published results 7 years after the entry of the last person into the study. The only exception is the New York Health Insurance Plan study, which published its first results at 5 years [4]. Published claims that NBSS follow-up was only 2 years reveal how ill-informed some critics are [32], whereas others refer without justification to the “supposed seven years of follow-up” [17].
Follow-up Procedures and Ascertainment
Follow-up procedures and ascertainment in the NBSS are described in the Methods section and summarized in Figure 1. Ascertainment of breast cancer cases and deaths has been said to be incomplete at publication. For 80% of the study population at the time of the analysis, 19 to 44 months had elapsed between center closure and record linkage with the relevant provincial cancer registries. Most screening schedules for any one center would have been completed before center closure. Deaths arising from cancer cases diagnosed more than 19 to 44 months after screening is completed are clearly less relevant to study mortality results than deaths from cancers diagnosed during screening or shortly after schedule completion. Only 20% of participants came from centers with a shorter interval between closure and cancer registry linkage (8 or 9 months), of whom half (10%) came from centers selectively recruiting older women (50 years and over), so the effect of any post-study under-ascertainment of breast cancer in women 40 to 49 years at entry is minimal.
Across Canada, linkages with cancer registries occurred in 1988 (Ontario and Quebec), 1989 (Alberta and British Columbia), and 1990 (Nova Scotia and Manitoba). All the cases of breast cancer so identified have been followed annually despite verbal claims to the contrary. That leaves a concern about new cases of cancer diagnosed after 1988, 1989, or 1990 in the relevant provinces; how many such patients could have died? Because 3-year survival rates are high in both the screened and the comparison groups (over 90%), it is unlikely that these later women with breast cancer would alter the mortality results reported in 1992. They will be identified at future linkages.
Other Factors
Prestigious journals such as Science have fueled controversy about screening by promulgating the unfounded but widely held belief that a long survival time after diagnosis is proof of the efficacy of early detection. A recent article [33] citing a report published in The Lancet [34] urged readers to disregard NBSS results in women aged 40 to 49 years. Because a 95% 5-year survival was observed in a Boston case series of women aged 40 to 49 years with in situ and invasive breast cancer diagnosed by mammography alone, the benefits of screening were “proven”. In fact, NBSS results show a 95% 7-year survival for similar women [1]. Lead-time bias, which is unavoidable in case series such as that reported in The Lancet, was not mentioned, nor was the fact that survival time after diagnosis is not a synonym for a reduction in cause-specific mortality.
The uncritical acceptance of erroneous information has been widespread. One NBSS radiologist said he knew that some centers did not have dedicated mammography units and that others used Xeromammography. A U.S. radiologist asked study investigators, “When did the NBSS start using compression?” It always used breast compression, always used dedicated mammography units, and never used Xeromammography [7]. Myths can preempt evidence [35, 36].
Conclusions
It is imperative that research be directed at determining why benefit from screening younger women has proved so difficult to show in so many studies. It is revealing that two radiologists have dismissed as the “scientific fringe” [17] hypotheses that assume there may be a biological basis for breast screening's apparent lack of benefit in these women. In fact, there are many biological differences between premenopausal and postmenopausal women and in the natural history of breast cancer in these two groups. The breast screening controversy is an excellent example of socioscientific controversy. Such controversy is to be expected when established medical practice is challenged by results from randomized controlled trials. The failure of intracranial arterial bypass to reduce the risk for ischemic stroke when tested in a randomized multicenter trial [37] distressed surgeons who believed the procedure was useful [38]. In the case of breast screening, not only are radiologists distressed but also women who have been programmed to overestimate their risk for breast cancer [39] and who therefore need to be reassured.
To resolve socioscientific controversy requires objective curiosity that equitably scrutinizes available evidence and goes on to seek answers to unanswered questions. This requires trials, not case series. Passion must proceed with dispassion. Failure to do so will indisputably lead to more harm than good. Wishful thinking aside, compelling evidence has yet to be found that breast screening in women under 50 years of age is beneficial in the first 12 years after screening is initiated.
- Copyright ©2004 by the American College of Physicians
RSS Feeds










