Policy and Program Analysis Using Administrative Databases
- Wayne A. Ray, PhD
- From Vanderbilt University School of Medicine, Nashville, Tennessee. Note: This article is one of a series of articles comprising an Annals of Internal Medicine supplement entitled “Measuring Quality, Outcomes, and Cost of Care Using Large Databases: The Sixth Regenstrief Conference.” To see a complete list of the articles included in this supplement, please view its Table of Contents. Grant Support: In part by grant HS07768 from the Agency for Health Care Policy and Research. Requests for Reprints: Wayne A. Ray, PhD, Department of Preventive Medicine, Vanderbilt University School of Medicine, A-1124 MCN, Nashville, TN 37232.
Abstract
Administrative policies and programs play an important and growing role as determinants of the use of medical care. Although some policies and programs may be harmful or ineffectual, randomized, controlled trials or prospective evaluations are rarely done for political or logistic reasons. Most evaluations are retrospective and often use administrative databases. Major problems with such evaluations include poor data quality, lack of concurrent controls, inability to ascertain important study outcomes, and incomplete data on case mix. This article uses published evaluations to illustrate these problems and suggests strategies that can minimize their impact. Such strategies include thorough assessment of data quality, interrupted time-series or policy gradient analysis, restriction of studies to those clinical outcomes that reliably result in medical care, and use of data on medical encounters as surrogates for determining case mix. However, even when these strategies are used, adequate evaluation of the effects of many policies and programs may continue to be impossible. Prospective evaluations need to be used more frequently to ensure that changes are held to the same standard used for other therapeutic interventions.
Administrative policies and programs play an important and growing role as determinants of the use of medical care. This trend has been driven by efforts to control costs and to standardize the quality of health care. As a result, the policies and programs adopted by managed care organizations, third-party payers, and local and federal governments increasingly influence medical care decisions. However, although such policies are designed to improve the efficiency or quality of medical care, they are often implemented without careful evaluation. Frequently, the tacit assumption is that policies or regulations can lead to only beneficial results; however, some policies may do more harm than good. An example is the three-prescription limitation for Medicaid outpatients in New Hampshire, a limitation that was linked to an increase in nursing home admissions [1]. Some policies may not be harmful but may be ineffective and therefore waste scarce resources. For example, federal regulations led many state Medicaid programs to implement complex and expensive retrospective programs on drug utilization without evidence that this approach improved medication use [2, 3].
It would be reasonable to adopt the same standard for policies as that upheld for physicians (primum non nocere). Evidence of efficacy, safety, and cost-effectiveness should be required. The gold standard of evidence for efficacy and safety is randomized, controlled trials. However, such trials are rarely used to evaluate changes in policy or regulation. One fundamental reason for this is politics. Conducting a randomized, controlled trial is an admission of uncertainty, as reflected in the term clinical equipoise, which describes the ethical context within which randomized, controlled trials are justifiable. Emphasizing that a new policy might not work or might even be harmful is incongruous with creating support for change. Indeed, policy changes are sometimes implemented quickly so that established interests do not have time to mount effective opposition. Furthermore, good randomized, controlled trials are time consuming and expensive. For similar reasons, prospective concurrent evaluations of policy changes are rarely done.
Retrospective (also called historical) studies have thus been the primary tool used to evaluate policy and program changes [4]. Retrospective studies require the availability of data to identify the population potentially affected by the change, define an appropriate comparison group, measure important baseline variables, and ascertain study outcomes. In this context, administrative databases are a potentially useful source of data for retrospective studies of policy and programs.
Administrative databases usually originate from systems that provide or finance medical care and contain computerized records of encounters between patients and health care providers [5-9]. To be useful for policy or program analysis, such databases must be accurate and complete, serve a well-defined population and contain information on all persons in the study population, include records (generally hospital admissions) for identifying disease occurrence or other study outcomes, and include other data needed to analyze a specific policy or regulation.
In the United States, the Medicare database is among the most widely used administrative databases for policy and program analysis [10]. Other administrative databases arise from health maintenance organizations, state Medicaid programs, or universal health insurance systems. Databases from state Medicaid programs have been used to evaluate regulations that affect Medicaid enrollees [1, 11-13].
Although administrative databases are potentially useful for analyzing policy and programs, they cannot be used indiscriminately. This article considers four major problems that can arise when administrative databases are used in retrospective studies: poor data quality, lack of concurrent controls, inability to ascertain essential study outcomes, and incomplete data on case mix.
Poor Data Quality
Use of an administrative database for research requires accurate, complete data elements. A dramatic example of how elementary problems with quality can lead to misleading conclusions is provided by a study [14] of geographic variations in the rate of coronary artery bypass graft surgery. For some time, policy analysts had been intrigued by small-area variations in the rate of this and other surgical procedures. The underlying premise was that geographic variations potentially reflected excessive use of procedures and curtailment could lead to substantial cost savings. Almost all of the research done in this area has depended on data from Medicare and other administrations.
A guest editorial in The New England Journal of Medicine in 1988 [15] noted that two adjacent areas in California had more than a threefold variation in the rate of coronary artery bypass graft surgery. The high-use region was served by a community hospital and the low-use region by an academic tertiary care medical center (presumably the gold standard). The editorial mentioned that $4 500 000 could be saved each year if rates in the high-use area were reduced to equal those in the low-use area. Subsequent responses [16] to the editorial revealed major problems with the quality of study data, which had been extracted from a statewide database on hospital discharges. The billing office of the hospital in the high-use region recorded the hospital zip code in the patient's address field, regardless of the patient's actual residence. Because the hospital had a contract with a large health maintenance organization, medical staff performed coronary artery bypass graft surgery on a substantial number of out-of-area residents. Correct classification of the patients' residences eliminated the difference in rates between the two regions.
This example also illustrates the need to cautiously interpret outliers. Outliers are of particular interest because they may represent practice patterns that need correction (outliers in the direction of lower quality or efficiency) or that are models for emulation (higher quality or greater efficiency). However, experience has taught many researchers that data problems should be considered as the first explanation for an outlier. Researchers who use administrative databases to analyze policies and regulations should understand the incentives for adequate data quality. If any uncertainty is present, careful validation studies of variables that are important to the study should be demanded.
Lack of Concurrent Controls
In an observational study, optimum assessment of the effects of a policy change requires that a concurrent control group not affected by the change be included in the study. However, because changes to policies and programs frequently encompass entire populations, control groups may be difficult to find. For example, policies governing the Medicare prospective payment system for U.S. hospitals were implemented throughout the United States, and this precluded using a control group from the Medicare population.
One solution might be to use a control group that is outside the geographic boundaries of the population that was subject to the policy change or never affected by the change. For example, evaluations of the prospective payment system could have included Canadian residents or the Medicare population before implementation of prospective payment as a control group. However, this strategy has numerous problems, including variations between the populations being studied, different methods of data collection, and secular trends. Two analyses that are useful for policy changes and do not rely on external control groups are interrupted time-series analysis and policy gradient analysis.
Interrupted Time-Series Analysis
A common design used for evaluating policy change compares study outcomes during a specified period preceding the change with outcomes during a similar period after the change. This design therefore uses a nonconcurrent control group (that is, it uses the same population for the period before and after policy implementation). However, a major deficiency in using a nonconcurrent control group is the difficulty in separating the effects of the policy change from those of secular trends or other concomitant changes [17]. An example would be a study of the effects of a formulary change. Comparison of the year before the formulary was implemented with the year after would confound secular trends in drug use with the effects of change in the formulary.
Interrupted time-series analysis addresses this limitation by testing for abrupt changes in study outcomes after policy implementation and by using regression analysis to control for secular trends [17, 18]. The rationale for such analysis can be illustrated by an analogy to anaphylaxis after drug use. Anaphylaxis that occurs shortly after administration of a drug is much more likely to be caused by the drug than is anaphylaxis that occurs 6 months later. The primary assumptions of this analysis (that policy implementation produced rapid changes in outcomes and that no other concomitant changes occurred) must be carefully assessed for each analysis of a specific policy or program.
An example of interrupted time-series analysis is the recent evaluation [13] of a Medicaid program of mandatory advance approval for expensive medications. In October 1989, the Tennessee Medicaid program implemented a policy that stipulated prior authorization for nongeneric nonsteroidal anti-inflammatory drugs (NSAIDs). This group of drugs was targeted because it accounted for 11% of pharmaceutical expenditures and because of a 12-fold variation in the cost of individual agents [19] without evidence that the more expensive NSAIDs had superior efficacy or lower overall toxicity [20].
The interrupted time-series analysis tracked monthly Medicaid expenditures for NSAIDs during the year preceding and the 2 years after implementation of the new policy (Figure 1). The NSAID expenditures for each enrollee decreased abruptly after the policy of prior authorization had been implemented. An estimated decrease of 65% (95% CI, 60% to 71%) occurred. Despite some evidence of a gradual increase in costs subsequent to the policy change (Figure 1), the analysis estimated that during the 2 years after implementation of prior authorization, total NSAID expenditures were reduced by $12.8 million. There was no evidence that utilization of other drugs or physician visits increased after the policy change.
Policy Gradient Analysis
The primary limitation of interrupted time-series analysis is the assumption that the effects of the policy or program change occur immediately or shortly after implementation. This assumption is reasonable when studying how abrupt changes in reimbursement policy affect service (as in the example of prior authorization). However, many policy changes have more gradual effects. If a new policy is announced well in advance, patients and providers may make anticipatory behavioral changes before the date of implementation [12]. Adverse clinical effects may not become apparent until well after the policy has changed.
Policy gradient analysis classifies a population into groups according to the extent to which they are affected by the policy change. Study outcomes should change the most in the group most affected by a new policy and effect a correspondingly lesser degree of change in other groups.
Policy gradient analysis has been used to evaluate the effect of new Medicaid regulations on pregnancy. These regulations represented a major policy initiative that was designed to improve birth outcomes in the United States by offering women in high-risk groups better access to prenatal care [21-23]. In Tennessee, the new policy led to seven major changes in Medicaid between 1984 and 1990 [24], including coverage for married women, an increase in the income cutoff to 150% of the poverty level, presumptive eligibility to facilitate enrollment early in pregnancy, and reimbursement for case management services in prenatal care. The rapid sequential implementation of multiple changes, each of which would probably take effect gradually, precluded doing an interrupted time-series analysis.
One policy gradient analysis studied 610 056 singleton births to married black women and white women in Tennessee from 1983 to 1991 [24]. The evaluation focused on married women because they were most affected by the new policy. These women were classified into eight groups on the basis of factors that could predict the extent to which they were affected by changes in the Medicaid policy: education, age, and mean per capita neighborhood income. Women in group 1 (<12 years of education, <25 years of age, and mean per capita neighborhood income <$12 500) were most affected by the changes. In group 1, Medicaid enrollment during the first trimester of pregnancy increased from 7.1% of births in 1983 to 67.9% by 1991, with a corresponding decrease in inadequate prenatal care (Kessner index [25]) from 12.8% in 1983 to 6.4% in 1991 (Figure 2). For women in the other groups, the improvement in prenatal care correlated with the increase in Medicaid enrollment during the first trimester (Figure 3). The married women least affected by the Medicaid expansions had 12 years of education or more, were 25 years of age or older, and had a mean per capita neighborhood income of more than $12 500. These women had essentially no change in first-trimester Medicaid enrollment or adequacy of prenatal care (Figure 3). From this analysis, researchers concluded that changes in the Medicaid policy did increase utilization of prenatal care. Unfortunately, there was no evidence of concomitant improvement in adverse outcomes for premature infants or those with low birthweights [24]. Rates of both outcomes were essentially constant in all groups during the entire study period.
This analysis was more sensitive to the effects of the Medicaid expansions than an analysis that tracked outcome rates for all women who gave birth in Tennessee because the changes did not affect most of the population. An analysis that involved all women in Tennessee would have included more than 50% of women who could not have benefited from the policy changes. This would be analogous to studying the efficacy of a new antihypertensive drug in a trial in which fewer than one half of persons being “treated” for hypertension were offered medication.
Despite a “dose-response” relation between increased Medicaid enrollment and increased prenatal care, birth outcomes did not improve. The obvious conclusion, that increased prenatal care does not improve birth outcomes, must be interpreted cautiously. Women who benefited from the changes differed considerably from those who did not and may have been subjected to different secular trends in perinatal outcomes. Without a control group, definitive evaluation of gradually implemented policies is difficult.
Ecological Analyses in Policy Evaluations
In the preceding analyses, the study groups included persons who were potentially rather than actually affected by a new policy. Such analyses are widely criticized because they are subject to ecological fallacy. In epidemiologic studies, ecological fallacy denotes the inability to link exposure with disease for individual members of a specific population. For example, in international comparisons of dietary fat intake and breast cancer, women in countries where high-fat diets are normal have higher breast cancer-related mortality rates. However, the comparisons do not show whether women who died of breast cancer actually had high-fat diets. Epidemiologists would prefer to study the diets of individual women and the extent to which they predict breast cancer-related mortality rates.
Ecological analysis of policy changes may suggest that changes were attributable to a policy even among persons not affected by that policy. However, it is often crucial to study everyone who might potentially be affected by a policy. Examples include evaluations of new policies for a vaccination program or prenatal care. Analyses that compare users and nonusers of a new program or service are susceptible to selection bias. Persons who receive vaccinations or early prenatal care may have behavioral or other factors that place them at lower risk than nonusers. The preferred alternative is to study persons who are eligible to receive vaccinations or early prenatal care, regardless of their actual behavior. This line of reasoning supports the intention to treat analysis in randomized, controlled trials as the gold standard, even though such analysis is ecological (that is, some members of a given treatment group will not receive the assigned treatment).
Interrupted time-series and policy gradient analyses contain safeguards against ecological fallacy. In an interrupted time-series analysis, environmental trends that may influence a study population unaffected by a policy change rarely coincide with the precise timing of the policy change. In a policy gradient analysis, the study group needs to include a high proportion of persons who are potentially affected by the policy change, thereby reducing the influence of persons who are not affected by the policy. This would be analogous to a well-designed clinical trial that requires each treatment group to have a high rate of compliance with the assigned treatment to avoid bias toward the null.
Limited Detection of Outcomes
An essential component of policy or program analysis is the capacity to ascertain outcomes (also called end points or dependent variables) accurately and completely. Outcomes may represent changes in health status or utilization of medical care. In studies of administrative databases, outcomes are often defined from records of medical care encounters, such as hospitalizations for peptic ulcer disease or the number of prescriptions for NSAIDs. Administrative databases can be used to detect outcomes that reliably result in diagnosis and treatment-the more severe the condition, the more complete detection is likely to be.
However, medical care encounters may be inadequate surrogates for defining such outcomes as asymptomatic conditions, subclinical disease, cognitive status, functional status, and other aspects of quality of life. If monitoring such outcomes is necessary to analyze a policy or program, then an administrative database by itself is unlikely to be a suitable source for study data.
In the study of the Medicaid policy on prior approval for NSAID prescriptions, prescription costs were reduced considerably but utilization of other medical care services did not increase. Can we definitively conclude that the policy change had no adverse effects? One limitation of the analysis was that the Medicaid files used for study data had no reliable information on changes in patient pain or function, unless such changes led to changes in utilization of medical care services. Therefore, some patients whose musculoskeletal pain was well controlled with an expensive NSAID may have had poor pain control with a less-expensive NSAID. However, this scenario is unlikely because there is no evidence of systematic differences in the efficacy of individual NSAIDs [20]. On the other hand, administrative data may not be suitable for evaluation of a program that encouraged switching from an expensive to an inexpensive antihypertensive drug. Such switching could affect both adequacy of blood pressure control and quality of life (for example, switching from angiotensin-converting enzyme inhibitor to methyldopa [26]), and these changes probably could not be reliably detected from administrative data.
Medication use may be a surrogate for some outcomes that are otherwise difficult to detect from administrative data. This practice is justified only if the accuracy and completeness of the surrogate indicator is supported by strong existing evidence or specific validation studies. In the example of switching antihypertensive medications, an antidepressant could serve as a potential marker of poorer quality of life for patients who were switched from an angiotensin-converting enzyme inhibitor to methyldopa. However, depression is often underdiagnosed in clinical practice and the rate of detecting symptoms of depression may have been correlated with the likelihood of switching medications.
Incomplete Data on Case Mix
In many policy or program analyses, adjusting for differences in the case mix of study groups is important for two reasons: Between-group differences that are thought to signify differences in quality of care mix, and differences in response to policy changes may represent differences in case mix.
The first reason can be illustrated by the long-standing goal to develop a practical method of systematically monitoring quality of care in hospitals. One approach has been to use administrative data to monitor the short-term mortality rate. This approach is based on the premise that, after adjusting for demographic factors, relatively high mortality rates reflect poorer care. This conclusion prompted the Health Care Financing Administration, in 1986, to begin publication of individual hospital mortality rates for Medicare patients [27]. However, this analysis policy penalizes facilities that care for sicker patients. An important policy concern has been the extent to which administrative data can discern differences in case mix between hospitals, thereby producing fair comparisons. To date, evidence suggests that Medicare and other databases on hospital discharges inadequately define case mix [27]. For example, Pollack and colleagues [28] studied variations in the mortality rate for nine pediatric intensive care units, where rates varied from 3% to 17% (P < 0.001) [28]. However, after adjusting for severity of illness on admission, as defined by reviewing patient charts, no significant between-hospital difference in mortality rate existed.
For some time, researchers have been intrigued with one proposed solution to the problem of incomplete data on case mix: using the history of medical care encounters as a surrogate for case mix [5, 8]. Pharmacoepidemiologic studies of elderly persons [5, 29, 30] that use Medicaid databases have found that hospitalization, nursing home residence, and use of medications in the past year are important predictive factors for various diseases. Whether these factors adequately capture case mix depends on the specific population of elderly persons and the outcomes being studied.
Unfortunately, Medicare and similar databases have little ability to capture case mix because they contain only indirect information on nursing home residence and no information on medication use. For example, Roos and colleagues [31] used an administrative database of hospital and physician claims to compare transurethral resection and open resection of the prostate as treatment for benign prostatic hyperplasia. These researchers noted that a paradoxical 20% increase in the 5-year mortality rate was associated with transurethral resection. This increase persisted after adjustment for case mix on the basis of previous physician visits and hospitalizations. A subsequent study by Concato and colleagues [32] had access to more detailed information on case mixture from clinical records and suggested that the small elevation in mortality rate for patients who had transurethral resection was the result of more comorbid conditions in this group.
Differences in case mix might also explain differences in responses to policy or program changes. For example, new federal regulations had been designed to ensure more appropriate use of antipsychotic drugs among elderly patients in nursing homes [12]. An evaluation that used a Medicaid database found a 27% decrease in nursing home use of antipsychotic drugs after the policy had been implemented [12]. However, substantial variations in the response were seen among nursing homes: homes in the most responsive quartile had decreases of 46% or greater, whereas those in the least responsive quartile had either no change or an increase in use of antipsychotic drugs. Does this finding suggest that the latter facilities did not follow policy guidelines and should be penalized?
One factor that may explain these variations in response is differences in the severity of dementia among nursing home patients. Dementia is the primary reason why antipsychotic drugs are used in this population. Residents with more severe symptoms receive more antipsychotic drugs [33], and efforts to withdraw antipsychotic drugs are less likely to succeed in more severely impaired residents [34]. Unfortunately, severity of symptoms could not be determined from the Medicaid data. Without this information, the analysis could not distinguish between failure to comply with a new policy and a resident population with more severe symptoms of dementia and severity of symptoms in different case mixes.
Conclusion
Policies and programs that govern provision of medical care can be expected to increasingly influence medical practice. This trend is exemplified by the growth of disease management programs that specify systematic procedures for diagnosing and treating common conditions. Proponents of such programs often focus on their potential to improve quality and decrease costs. Unfortunately, the opposite effect is possible, even in well-designed programs. Rigorous evaluation is therefore an essential component of policy and program development.
To date, carefully planned evaluations of policy and program changes have apparently been the exception rather than the rule. Many evaluations have been retrospective. If used properly, administrative databases are a valuable resource for retrospective evaluations. However, the usefulness of these databases can be limited by data quality, the absence of a control group, the lack of data elements that correspond to important outcomes, and problems in accounting for differences in case mix. Including more data elements in administrative databases is an obvious approach to improving their suitability for use in research. However, without a programmatic rationale for identifying these data elements, data quality may not be assured.
Ultimately, greater emphasis needs to be placed on concurrent, planned evaluation of policy and program changes through either randomized, controlled trials or prospective cohort analyses. Some disease management programs, such as studies of management of hyperlipidemias [35], the value of annual home nursing visits [36], and the efficacy of an intensive program for management of heart failure [37], were conducted on the basis of concurrent analysis. More of these prospective concurrent evaluations are needed to ensure that policy and program changes do more good than harm.
- Copyright ©2004 by the American College of Physicians
RSS Feeds












