Ethnic and Sex Bias in Primary Care Screening Tests for Alcohol Use Disorders
- Jeffrey R. Steinbauer, MD;
- Scott B. Cantor, PhD;
- Charles E. Holzer III, PhD; and
- Robert J. Volk, PhD
- From The University of Texas Medical Branch at Galveston, Galveston, Texas; and the University of Texas M.D. Anderson Cancer Center, Houston, Texas. For current author addresses, see end of text. Acknowledgments: The authors thank Kristi O'Dell, PhD, Carol Carlson, and Kristy Smith for assistance in managing this project and Lynn Alperin for editorial expertise. They also thank Cherry Lowman, PhD, and Bridget Grant, PhD, of the National Institute on Alcohol Abuse and Alcoholism, for guidance and support. The SAAST is copyrighted by the Mayo Foundation and is used with permission. Grant Support: In part by grants from the National Institute on Alcohol Abuse and Alcoholism (AA09496) and the Bureau of Health Professions, Health Resources and Services Administration (D32-PE16033). Requests for Reprints: Robert J. Volk, PhD, Department of Family Medicine and Community Medicine, Baylor College of Medicine, 5510 Greenbriar, Houston, TX 77005. Current Author Addresses: Dr. Steinbauer: Blackstock Family Health Center, 4614 IH-35, Austin, TX 78751.
Abstract
Background: The use of self-report screening tests for alcohol use disorders in the primary care setting has been advocated.
Objective: To test for ethnic and sex bias in three self-report screening tests for alcohol use disorders in a primary care population.
Design: Cross-sectional study with patients randomly selected from appointment lists.
Setting: University-based family practice clinic.
Patients: Probability sample of 1333 adult family practice patients stratified by sex and ethnicity.
Measurements: Patients completed 1) a diagnostic interview to determine the presence of a current alcohol use disorder and 2) three screening tests: the CAGE questionnaire, the Self-Administered Alcoholism Screening Test (SAAST), and the Alcohol Use Disorders Identification Test (AUDIT).
Results: The areas under the receiver-operating characteristic (ROC) curves for the CAGE questionnaire and the SAAST ranged from 0.61 to 0.88 and were particularly poor for African-American men and Mexican-American women. For the AUDIT, the area under the ROC curves was greater than 0.90 for each patient subgroup. The sensitivity of the CAGE questionnaire and the SAAST at standard cut-points was lowest for Mexican-American women (0.21 and 0.13, respectively). Positive likelihood ratios for the AUDIT were similar to or higher than those for the other screening tests, whereas negative likelihood ratios were lowest for the AUDIT (<0.33), indicating the superiority of this test in ruling out a disorder.
Conclusions: A marked inconsistency in the accuracy of common self-report screening tests for alcohol use disorders was found when these tests were used in a single clinical site with male and female family practice patients of different ethnic backgrounds. The AUDIT does not seem to be affected by ethnic and sex bias.
Alcohol use is the third leading cause of preventable death in the United States [1], and alcohol-related morbidity is substantial [2]. For many persons with alcohol problems, a primary care provider is the first contact with the health care system [3]. Unfortunately, the problem often goes unrecognized until it has had significant consequences for physical health [4].
Many professional organizations recommend questioning patients about alcohol use [5-7]. The routine use of biochemical markers as the primary method for screening for alcohol problems in asymptomatic patients is discouraged by the U.S. Preventive Services Task Force because the accuracy of such tests is poor compared with that of self-report measures [6]. Many self-report screening tests have been developed to help identify patients with alcohol use disorders. Nevertheless, concern is growing over the lack of validation of these tests in patients who are female, elderly, or nonwhite [8]. Concerns about potential ethnic and sex bias in screening accuracy are particularly important because patterns of alcohol use [9, 10], the prevalence of alcohol use disorders [11, 12], and the consequences of alcohol consumption [2, 13] vary in men and women from different ethnic backgrounds in the United States.
We tested for bias in the accuracy of three common self-report screening tests across sex and ethnic subgroups of primary care patients. The CAGE questionnaire was selected for evaluation because it is one of the most widely used screening tests for alcoholism. It was developed originally to identify the “hidden alcoholic” in hospital settings [14] and has also been evaluated in primary care settings [15, 16]. We also selected the Self-Administered Alcoholism Screening Test (SAAST), a self-administered version of the Michigan Alcoholism Screening Test, for evaluation (Appendix Figure 1). The SAAST was developed to screen for alcoholism in general medical patients and is available in a 9-item version with response options in a “yes/no” format. The final instrument selected for evaluation was the Alcohol Use Disorders Identification Test (AUDIT), developed by the World Health Organization [17] (Appendix Figure 2). The 10-item AUDIT was developed to detect persons with early alcohol use problems who do not necessarily meet the diagnostic criteria for alcohol dependence. In our study, the criterion variable was a current alcohol use disorder, including alcohol abuse and alcohol dependence, as defined by the Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV) [18]. Recommended methodologic standards for evaluating diagnostic tests guided this analysis [19].
Methods
Patients and Procedures
Participants were adult primary care patients presenting to the Family Practice Center at the University of Texas Medical Branch, Galveston, Texas. This family medicine clinic, which is a residency-training site, serves an ethnically diverse community and has an annual patient-visit volume in excess of 30 000; the patients are a mix of privately insured, managed care, Medicaid, Medicare, and uninsured patients. Faculty and resident practices are located at the Center, which has approximately 12 faculty physicians and 20 resident providers.
The sampling strategy was designed to ensure adequate representation of minority and female patients. Adult family medicine patients were randomly selected from the Family Practice Center appointment lists. For each clinic session, a patient was selected at random by using a table of random numbers from among those patients who had appointment times within the first 60 minutes of the session. Thereafter, patients were selected according to appointment time at fixed intervals (for example, 45 minutes after the previously selected patient) to allow for a manageable flow of patients through the interview process. Patients were contacted about participating in the study by telephone on the day before their scheduled appointments. Patients who could not be reached by telephone (30%) were approached directly in the clinic's waiting area on the day of their appointment. If a patient refused to participate in the study, the next patient on the appointment schedule was approached. Sampling continued until at least 100 men and 250 women in each ethnic group had participated. The sampling strategy is described in more detail elsewhere [20].
Data were collected between October 1993 and December 1994. While waiting to see their physicians, patients completed self-report questionnaires that included questions about sociodemographic indicators and the SAAST. After their office visits, patients participated in an interview that was administered by project interviewers and included the CAGE questionnaire, the AUDIT, and a diagnostic schedule used to determine the presence of an alcohol use disorder. Interviewers were not given the results of the diagnostic interview, which was scored by computer algorithm after the questionnaire and interview had been completed. All study materials were translated into Spanish, and Spanish-speaking interviewers were used with Mexican-American patients (30 patients selected Spanish administration). Patients were reimbursed $10 for their time. Written informed consent was obtained from each patient, and the project was approved by our institutional review board.
Instruments
CAGE
The acronym CAGE represents four brief questions: Have you ever felt you should Cut down on your drinking? Have people Annoyed you by criticizing your drinking? Have you ever felt bad or Guilty about your drinking? Have you ever had a drink first thing in the morning (Eye-opener) to steady your nerves or to get rid of a hangover?
The CAGE was developed as a device to screen for alcoholism in hospital settings, where high rates of alcohol abuse are often seen [14]. It is also widely used in clinical settings and community-based studies and is considered an indirect measure of alcoholism because it addresses the consequences of drinking (with the exception of the eye-opener question) rather than alcohol consumption per se [21]. The CAGE can be used during the clinical interview (self-administered) or as part of a broader assessment of alcohol use (as was done in this study). A “yes” answer to two or more questions is generally considered a positive result [21], although an approach that uses likelihood ratios has also been proposed [15]. The time frame for the CAGE is lifetime.
Self-Administered Alcoholism Screening Test
The SAAST [22-24] is a modified, self-administered version of the Michigan Alcoholism Screening Test. In our study, we used the 9-item version of the SAAST (completed by patients before the diagnostic interview) because its reduced length is more amenable to primary care settings [25]. The Michigan Alcoholism Screening Test is a structured, 25-item questionnaire that has been used to detect alcoholism in many groups, including persons suspected of driving while under the influence of alcohol [26]. The 9-item version of the SAAST was developed for use in medical settings and has shown consistency in U.S. and Mexican samples [27]. Three items are similar to the “annoyed,” “eye-opener,” and “cut down” questions from the CAGE; the rest address the consequences of drinking and indicators of dependence. The instrument is scored by summing responses to the questions (the annoyed and cut-down questions each receive a weight of 2, and all others receive a weight of 1), and a score of 3 or more is considered a positive result [25]. The time frame for the SAAST is lifetime.
Alcohol Use Disorders Identification Test
The AUDIT is a 10-item, self-report screening test that identifies patients at risk for alcohol use disorders by using procedures appropriate for the variety of health care facilities in developed and developing counties [17, 28, 29]. The AUDIT was developed by the World Health Organization (WHO) for the express purpose of avoiding ethnic and cultural bias. An extensive, multinational instrument development study of primary health care patients was coordinated by WHO (the WHO Collaborative Project on Early Detection of Persons with Harmful Alcohol Consumption) to eliminate such bias [17]. The AUDIT has three important advantages over other screening tests: It 1) identifies “at-risk” alcohol users who do not meet criteria for alcohol dependence, 2) includes both consumption-based indicators of alcohol problems and indicators of harmful use and dependence, and 3) uses both current (defined as “within the past month”) and lifetime time frames. Response options range from 0 to 4, and a positive result is a score of 8 or more [28] (alternative cut-points and approaches using likelihood ratios have been suggested [20, 30, 31]). The instrument can be self-administered or given orally (as was done in our study).
Alcohol Use Disorders Diagnostic Schedule
The patient interview included the Alcohol Use Disorder and Associated Disabilities Interview Schedule (AUDADIS), a structured diagnostic schedule developed for use in the National Longitudinal Alcohol Epidemiologic Survey, which was started in 1992 by the National Institute on Alcohol Abuse and Alcoholism [32]. The AUDADIS has shown reliability in clinical and general population studies, applicability for cross-cultural research, and concordance with other diagnostic instruments [33-36]. It was designed to be administered by trained lay interviewers, as was done in our study. We used the AUDADIS Alcohol Experiences module to determine the presence of alcohol abuse or dependence according to the DSM-IV criteria [18]. Alcohol dependence, as defined by DSM-IV, is a maladaptive pattern of alcohol use leading to clinically significant impairment or distress as manifested by three or more of the following criteria: increased tolerance, withdrawal, impaired control, neglect of activities, increased time spent drinking, and drinking despite problems. Alcohol abuse is also a maladaptive pattern of alcohol use with which patients do not meet the criteria for alcohol dependence but have problems in one or more of the following areas: failure to fulfill major role obligations, drinking in physically hazardous situations, alcohol-related legal problems, and continued use of alcohol despite social or interpersonal problems.
Statistical Analysis
The prevalence of current and lifetime DSM-IV alcohol abuse or dependence disorders was calculated for each patient subgroup.
Receiver-operating characteristic (ROC) curves were used to evaluate the accuracy of the screening tests in detecting a current alcohol use disorder. The analyses were repeated, with a lifetime alcohol use disorder serving as the criterion variable, to examine the effect of the time frame on each screening test's performance. In ROC curve analysis, the sensitivity of a test (its true-positive rate) is plotted against its false-positive rate (1 −specificity) across the range of cut-points for determining a positive result [37]. The resulting curve can be summarized by its area (the area under the curve), which is a measure of the test's discriminatory power. The area under the curve can be interpreted as the probability, given one normal patient and one abnormal patient drawn at random from the population, that the abnormal patient will score higher on the test. An area under the curve of 1.0 indicates perfect discrimination, whereas an area under the curve of 0.50 indicates discrimination no better than chance.
For this analysis, we used ROC Analyzer for Windows software [38] and the trapezoidal method (a nonparametric measure appropriate for ordinal tests) to estimate the area under the curve and the associated SEs for each screening test. For each screening test, we used z-tests to compare area under the curve estimates across patient subgroups [37]. Within each patient subgroup, z-tests with a correction for pairing were used to test for significant differences in the area under the curve estimates across screening instruments [39]. We conducted supplementary analyses by using the same strategy to test for differences in the operating characteristics of the screening tests according to the educational level and economic status of patients.
We examined the clinical importance of patient subgroup bias by estimating the sensitivity, specificity, and positive and negative likelihood ratios (with 95% CIs) of each screening test, using recommended cut-points for a positive result. For the CAGE questionnaire and the SAAST, we used cut-points of 2 and 3, respectively. We selected the cut-point for the AUDIT by using an approach described by Sox and colleagues [40]. This approach determines the optimal cut-point by considering both the prior probability of disease and the importance of false-negative and false-positive results. (For simplification, the cost of a false-negative result and the benefit of a false-positive result were assumed to be equivalent.) In this study, positive likelihood ratios of 3 to 6 for women and 6 to 12 for men were deemed optimal, suggesting that a positive AUDIT result was a score of 5 or more. We also calculated the positive predictive value (the probability that a patient with a positive result has a disorder) and the negative predictive value (the probability that a patient with a negative result does not have a disorder). Prevalence estimates for each subgroup were used to calculate predictive value. The TwoByTwo Analyzer, version 1.0 [41], was used for these analyses.
Role of Study Sponsor
Our funding sources had no role in gathering, analyzing, or interpreting the data or in the decision to submit this paper for publication.
Results
Of the 1445 patients invited to participate in the study, 82 (5.7%) declined to participate, 21 were ineligible because they were not members of the ethnic groups being targeted, and 9 withdrew during the interview. The final sample comprised 1333 patients.
Study patients ranged in age from 18 to 86 years (mean ±SD, 43.2 ± 15.7 years). Level of education varied by ethnic group; 45.2% of the African-American patients, 57.1% of the Mexican-American patients, and 31.1% of the white patients had no more than a high school degree. Annual family income followed a similar pattern-68.5% of the African-American patients, 59.3% of the Mexican-American patients, and 36.6% of the white patients reported annual incomes less than $20 000.
Prevalence of Alcohol Use Disorders
Prevalence estimates for current (past year) and lifetime alcohol abuse or dependence disorders according to DSM-IV criteria are shown in Table 1. Prevalence was higher among men and among Mexican-American patients.
Screening Accuracy by Patient Subgroup
Table 2 shows, for each patient subgroup, the area under the ROC curve estimates for the screening tests. These estimates showed considerable variation across patient subgroups for the CAGE questionnaire and the SAAST. The CAGE questionnaire performed better for African-American women than for African-American men (P = 0.01); the same pattern was seen for the SAAST, which had greater accuracy for white men than for white women (P = 0.03). The area under the curve estimates for the CAGE questionnaire and the SAAST for African-American men and Mexican-American women were particularly low.
In contrast, the area under the curve estimates for the AUDIT were significantly greater than those for the CAGE questionnaire and the SAAST, and they were fairly consistent across patient subgroups. For each patient subgroup, paired z-tests showed that the AUDIT was more accurate than the CAGE questionnaire or the SAAST (P < 0.05 to P < 0.001). The area under the curve estimates for the CAGE questionnaire and the SAAST did not differ significantly within each patient subgroup.
When the analyses were repeated with a lifetime alcohol use disorder serving as the criterion, the pattern of the findings within patient subgroups and across screening tests was similar to that reported for a current alcohol use disorder, although the area under the curve estimates were somewhat lower overall. Supplementary analyses showed no significant differences in area under the curve estimates for the CAGE questionnaire, the SAAST, or the AUDIT across education level or income.
Clinical Importance of Subgroup Bias
To examine the clinical importance of the observed differences in the overall accuracy of the screening tests across patient subgroups, we expressed the results of each screening test as positive or negative, using cut-points of 2 for the CAGE questionnaire [21], 3 for the SAAST [25], and 5 for the AUDIT. Table 3 and Table 4 gives operating characteristics (sensitivity and specificity) for each screening test, along with positive and negative likelihood ratios and positive and negative predictive values. The sensitivity of the CAGE questionnaire varied from a low of 0.21 for Mexican-American women to a high of 0.69 for white men. The sensitivity of the SAAST was also low for Mexican-American women (0.13) and highest for white men (0.69). The sensitivities of the AUDIT consistently exceeded those of the other two screening tests. Specificity of the AUDIT was consistently higher for women than for men. Specificity was lowest for African-American men completing the SAAST.
Table 3 and Table 4 also shows the positive and negative likelihood ratios for each screening test. Positive likelihood ratios (the ratio of sensitivity to false-positive rate) were lowest for African-American men completing the CAGE questionnaire and the SAAST and for Mexican-American women completing the SAAST. The positive likelihood ratios tended to be higher for the AUDIT than for the CAGE questionnaire and the SAAST, except in African-American women and Mexican-American men. Negative likelihood ratios (the ratio of false-negative rate to specificity) were smallest for the AUDIT, suggesting that this instrument is the best screening test for ruling out a disorder.
Finally, Table 3 and Table 4 shows the predictive value of the screening tests for each patient subgroup by using the prevalence estimates of a current alcohol use disorder from Table 1. (Because predictive value depends largely on prevalence, these comparisons are best made within each patient subgroup.) For white men, African-American women, and Mexican-American men, the positive predictive value was similar for each screening test. Yet for white and Mexican-American men, the negative predictive value was highest for the AUDIT (the probability of an alcohol use disorder after a negative result was 2% to 3% for these patients). For white women, African-American men, and Mexican-American women, the positive predictive value and the negative predictive value were higher with the AUDIT than with the CAGE questionnaire and the SAAST.
Discussion
In adult family medicine patients, considerable inconsistency was seen in the accuracy of common self-report screening tests for alcohol use disorders when the tests were used with male and female primary care patients from different ethnic backgrounds. Overall, the CAGE questionnaire and the SAAST were inconsistent in their accuracy across patient subgroups; the AUDIT was more accurate and did not seem to have sex-related or ethnic bias. For African-American men, white women, and Mexican-American patients, the CAGE questionnaire and the SAAST showed poor discrimination in identifying patients with an alcohol use disorder. In contrast, each screening test had good discrimination for African-American women. The clinical importance of these findings can be seen in the estimates of positive predictive value, where values for the AUDIT were similar to or higher than those for the other screening tests within patient subgroups. Similarly, the negative predictive value was lowest for the AUDIT, again suggesting that this instrument is more effective than the other screening tests for ruling out a disorder.
Similar subgroup differences in the accuracy of the CAGE questionnaire and the Brief Michigan Alcoholism Screening Test (an abbreviated, 10-item version of the Michigan Alcoholism Screening Test [42]) have been seen among patients in emergency departments [43]. A study of ambulatory care patients (primarily African Americans) showed no sex-related differences in the accuracy of the CAGE questionnaire [15]. This study reported a prevalence of alcohol use disorders that was two- to threefold higher than that seen in our study. In a lower-prevalence population, such as ours, that has greater heterogeneity with respect to alcohol use problems, the CAGE questionnaire does not seem to discriminate as well. We have reported elsewhere an item analysis of the CAGE questionnaire in our patient sample [44].
Differences in screening accuracy may be due in part to the samples used as norms to develop and evaluate the screening tests. Many studies of screening tests for alcohol use problems have used highly disparate groups (for example, alcoholic inpatients), including those in which baseline prevalence is higher and disease has progressed further than would be typical in patients from primary care settings [17]. Alcohol use problems in primary care settings are more varied: Many patients in these settings consume alcohol at levels that place them at risk for diminished physical health but have not yet experienced such problems [20]. This distinction is important because screening accuracy is easier to establish among highly disparate groups than among those representing the complete spectrum of alcohol problems. Still other studies have used ethnically homogenous samples [26].
Consequence-based measures of alcohol use problems may be culturally biased and may differ for men and women [45]. Subjective indicators, such as guilt about alcohol use, may suffer from the same biases. For example, patterns of alcohol use among Mexican-American men are characterized by infrequent but high-quantity consumption, which is considered normative [46]. Sex-related differences across ethnic groups for the CAGE questionnaire and the SAAST were not consistent in this study. Among African Americans, relations between ethnic identity and norms for drinking, religiosity, and rates of alcohol consumption have been seen [13, 47]. Among African-American women compared with white women, prohibitions against alcohol use, high lifetime rates of alcohol abstention, and a higher incidence of dependence-related problems and social consequences of drinking may explain the high accuracy of each screening test (that is, fewer African-American female drinkers do not meet the criteria for an alcohol disorder) [48]. The findings from this study suggest that each screening test discriminates well between African-American women with a current alcohol use disorder and those without such a disorder.
An alternative explanation for the differences found in this study is that economic factors-not ethnicity-are related to screening accuracy. However, several observations suggest that this is not so. First, the prevalence of alcohol use disorders was not related to educational attainment or reported annual income. Second, the pattern of the area under the curve estimates was not consistent with this alternative: The screening tests were more accurate for African-American women (the lowest income group) than for African-American men. Third, educational attainment and reported annual income were not associated with screening scores. Finally, in supplementary analyses where models were estimated separately for different educational and income levels, the association between the screening test results and a current disorder remained consistent.
Some authors have argued that the CAGE questionnaire has diminished in value as concern about lower levels of consumption and the identification of patients at risk for alcohol-related problems has increased [49]. At the same time, this questionnaire offers the advantages of brevity, ease of recall, and ease of scoring. These advantages are important because more lengthy self-report tools may not be amenable to use in many clinical settings. The SAAST has the advantage of being self-administered, and a collateral version for family members is also available [50]. The CAGE questionnaire and the SAAST do not address the question of how recently symptoms occurred; thus, patients considered to be in remission may screen positive. Additional inquiry about whether symptoms are current is required [51]. In contrast, the AUDIT includes both lifetime and current time frames, includes indicators of alcohol consumption as well as alcohol-related problems, and is easily administered in the clinical setting [52]. Of note, the AUDIT was developed to identify persons at risk for alcohol use problems. A positive screening test result on the AUDIT should be considered in light of where the potential problem lies (for example, frequent binge drinking) and the possible medical benefits, for some patients, of drinking alcohol.
This study applied several accepted methodologic standards for studies evaluating diagnostic tests [19, 53]. The spectrum composition (standard 1) of the population studied was addressed by including unselected patients and by oversampling women (because the prevalence of alcohol use disorders is lower among women than among men). The primary aim of the study was to consider patient subgroup biases in the performance of these screening tests (analysis of indices of accuracy in pertinent subgroups, standard 2). Workup bias (standard 3) was avoided by administering the diagnostic interview to all patients in the study, regardless of screening test results. Computer scoring algorithms were used to score the screening tests and to determine whether patients met diagnostic criteria (avoidance of review bias, standard 4). We reported SEs for the area under the curve estimates and gave 95% CIs for likelihood ratios (reporting indexes of precision of the results for test accuracy, standard 5). Presentation of indeterminate test results (standard 6) was not a problem with these screening tests because cut-points to determine positivity have been identified. We did not design a parallel validation study (standard 7, reproducibility of the test). The prevalence of alcohol use disorders may vary across clinical populations but, given the spectrum of alcohol use problems in this sample, it seems less likely that screening accuracy would vary significantly across settings.
Our study was limited to a single clinical site, included only family medicine patients, and did not address screening performance in all ethnic groups. The marked variation in screening performance across ethnic groups suggests that further studies should explore intraethnic variation [54]. Finally, the order in which instruments were completed, which was not randomized, and having the patients complete other measures of alcohol use (such as measures of quantity and frequency of consumption) may have introduced a systematic bias [55].
This cross-sectional study shows marked variation in the accuracy of common self-report screening tests for alcohol abuse when used in an ethnically heterogeneous primary care setting. These biases seem to have clinical relevance. In primary care settings, including those serving multiethnic communities, the use of unbiased screening tests, such as the AUDIT, is recommended.
Dr. Cantor: Section of General Internal Medicine, Department of Medical Specialties, University of Texas M.D. Anderson Cancer Center, 1515 Holcombe Boulevard, Box 40, Houston, TX 77030.
Dr. Holzer: Department of Psychiatry and Behavioral Sciences, The University of Texas Medical Branch at Galveston, 301 University Boulevard, Galveston, TX 77555-0429.
Dr. Volk: Department of Family and Community Medicine, Baylor College of Medicine, 5510 Greenbriar, Houston, TX 77005.
- Copyright ©2004 by the American College of Physicians
RSS Feeds











