Predicting Who Dies Depends on How Severity Is Measured: Implications for Evaluating Patient Outcomes
- Lisa I. Iezzoni, MD, MSc;
- Arlene S. Ash, PhD;
- Michael Shwartz, PhD;
- Jennifer Daley, MD;
- John S. Hughes, MD; and
- Yevgenia D. Mackiernan, BA
- From Harvard Medical School, Beth Israel Hospital, Boston University Medical Center, and Boston University, Boston, Massachusetts; the Brockton/West Roxbury Veterans Affairs Medical Center, West Roxbury, Massachusetts; and the West Haven Veterans Affairs Medical Center, West Haven, Connecticut. Grant Support: This research was supported by grant RO1 HS06742-03 from the Agency for Health Care Policy and Research. Dr. Daley is Senior Research Associate, Career Development Program of the Department of Veterans Affairs Health Services Research and Development Service. Requests for Reprints: Lisa I. Iezzoni, MD, Division of General Medicine and Primary Care, Department of Medicine, Beth Israel Hospital, Room LY-326, 330 Brookline Avenue, Boston, MA 02215. Current Author Addresses: Dr. Iezzoni and Ms. Mackiernan: Division of General Medicine and Primary Care, Department of Medicine, Beth Israel Hospital, Room LY-326, 330 Brookline Avenue, Boston, MA 02215.
Abstract
Objective: To determine whether assessments of illness severity, defined as risk for in-hospital death, varied across four severity measures.
Design: Retrospective cohort study.
Setting: 100 hospitals using the MedisGroups severity measure.
Patients: 11 880 adults managed medically for acute myocardial infarction; 1574 in-hospital deaths (13.2%).
Measurements: For each patient, probability of death was predicted four times, each time by using patient age and sex and one of four common severity measures: 1) admission MedisGroups scores for probability of death scores; 2) scores based on values for 17 physiologic variables at time of admission; 3) Disease Staging's probability-of-mortality model; and 4) All Patient Refined Diagnosis Related Groups (APR-DRGs). Patients were ranked according to probability of death as predicted by each severity measure, and rankings were compared across measures. The presence or absence of each of six clinical findings considered to indicate poor prognosis in patients with myocardial infarction (congestive heart failure, pulmonary edema, coma, low systolic blood pressure, low left ventricular ejection fraction, and high blood urea nitrogen level) was determined for patients ranked differently by different severity measures.
Results: MedisGroups and the physiology score gave 94.7% of patients similar rankings. Disease Staging, MedisGroups, and the physiology score gave only 78% of patients similar rankings. MedisGroups and APR-DRGs gave 80% of patients similar rankings. Patients whose illnesses were more severe according to MedisGroups and the physiology score were more likely to have the six clinical findings than were patients whose illnesses were more severe according to Disease Staging and APR-DRGs.
Conclusions: Some pairs of severity measures assigned very different severity levels to more than 20% of patients. Evaluations of patient outcomes need to be sensitive to the severity measures used for risk adjustment.
Hospital and physician performance is increasingly scrutinized by organizations ranging from state governments to managed care payers to local business coalitions [1-7]. Hospitals and medical practices also monitor their own results to identify areas in which they can produce improvement and savings. Performance profiles of health care providers often compare patient outcomes, such as death rates; comparing such outcomes across hospitals or physicians generally requires adjustment for patient risk. Risk adjustment recognizes that the underlying nature of some patients' diseases makes those patients more likely than others to have poor outcomes [8, 9].
More than a dozen risk-adjustment tools, often called severity measures, have been created specifically to address health care administration and policy concerns [1-7, 10-12]. Unlike clinical measures of risk, which can incorporate such factors as disease-specific clinical findings, complexity of comorbid illness, and functional status [13], severity measures rate patients on the basis of limited data—either computerized hospital discharge abstracts [14, 15] or information gathered from medical records by using abstraction protocols independent of specific diseases [16-18]. These methods generally focus on predicting hospital resource consumption or in-patient death. They are frequently proprietary, and their complete logic is often unavailable for scrutiny.
Severity measures are now marketed widely to hospitals, payers, business leaders, and governments. Some states (Pennsylvania, Iowa, Colorado, and Florida, for example), regions (such as Cleveland and Orlando), and payers produce comparative performance reports of health care providers by using particular severity measures [1-5]. Important decisions are increasingly made on the basis of severity-adjusted patient outcomes. For example, since 1986, Pennsylvania has required hospitals to produce severity information using MedisGroups. Payers have used MedisGroups-based reports to select health care providers for managed care networks [5]. Pennsylvania's “consumer guide” [19], which compares hospital death rates and average charges for coronary artery bypass graft surgery, was quoted by President Clinton in his 22 September 1993 health care reform address to the United States Congress [20]:
We have evidence that more efficient delivery of health care doesn't decrease quality.… Pennsylvania discovered that patients who were charged $21,000 for [coronary bypass] surgery received as good or better care [based on MedisGroups severity-adjusted death rates] as patients who were charged $84,000 for the same procedure in the same state. High prices simply don't always equal good quality.
Despite the potential effects of severity measures, relatively little independent information is available about them [21]. Because they are used to evaluate hospitals and physicians, physicians must assess them, especially with respect to their clinical credibility. In this article, we focus on predicting in-hospital death using four severity measures, and we ask three major questions: 1) How well do severity measures predict in-hospital death? 2) Do different severity measures predict different likelihoods of death for the same patients? and 3) If so, what are the clinical characteristics of patients for whom very different likelihoods of death are predicted by different severity measures?
Methods
Severity Measures
We considered four severity measures (Table 1): the admission MedisGroups score [18]; a physiology score patterned after the acute physiology score of the Acute Physiology and Chronic Health Evaluation, third version (APACHE III) [22, 23]; Disease Staging's scale predicting probability of in-hospital death [24-26]; and All Patient Refined Diagnosis Related Groups (APR-DRGs) [27]. These systems are among the most prominent approaches used to adjust outcomes data for severity so that they can be used for state or regional comparisons across hospitals [1-5] and for hospital activities such as internal monitoring, negotiation of managed care contracts, and physician profiling.
Each measure defines severity in ways that reflect that measure's goals, assigning either numerical severity scores or values on a continuous scale (Table 1). Disease Staging and APR-DRGs use data from standard hospital-discharge abstracts [14, 15], including patient age, patient sex, and diagnoses and procedures coded using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). A discharge abstract contains codes for all diagnoses treated during a particular hospitalization, regardless of when the diagnoses were made. MedisGroups and the physiology score use clinical data abstracted from medical records for only the first 2 days of a hospitalization.
Although the APR-DRGs measure was not initially developed to predict mortality, it is used for such analyses. For example, Iowa once required larger hospitals to produce MedisGroups data for severity-adjusted performance reports, but it switched in 1984 to using APR-DRGs—a less expensive, discharge-abstract-based measure. This change was partially motivated by the perceived high cost of MedisGroups medical record reviews. Other states, such as Florida, also use APR-DRGs to evaluate health care provider performance.
Database
To assign severity scores to patients, computerized algorithms were applied to data extracted from the 1992 MedisGroups Comparative Database. Briefly, this database contains the clinical information collected using the MedisGroups data-gathering protocol and submitted to MedisGroups' vendor, MediQual Systems, Inc. The 1992 MedisGroups Comparative Database contains information on all discharges made in 1991 from 108 acute care hospitals, which were chosen by MediQual Systems because of good data quality and in order to encompass a range of hospital characteristics.
To ensure adequate sample sizes for hospital-level analyses in another study [28], we eliminated eight low-volume institutions (83 patients total). The American Hospital Association annual survey provided information on hospital characteristics.
Admission MedisGroups scores were provided by MediQual Systems; scores for other measures had to be assigned. The MedisGroups database contains standard discharge-abstract information, including ICD-9-CM codes for as many as 20 diagnoses and 50 procedures, listed by hospital. It also includes values for key clinical findings from the admission period (generally the first 2 days of hospitalization), abstracted from medical records during MedisGroups reviews [16-18]. We used these clinical findings to create physiology scores patterned after the APACHE III acute physiology score, summing weights specified by APACHE III for each finding (for example, a pulse of 145 beats/min had a weight of 13 points) [22]. We could not replicate exact APACHE III acute physiology scores because complete values for the required 17 physiologic variables were unavailable: MedisGroups truncates data collection in broadly defined normal ranges [29]. Previous research [29] showed that a similarly derived physiology score did well compared with the exact acute physiology scores of the second APACHE version.
Vendors scored the data for the two discharge-abstract-based severity measures (Table 1). On the basis of their specifications, vendors were sent computer files containing the required discharge-abstract data extracted from the MedisGroups database. We merged the scored data into a single analytic file with 100% success.
Study Sample and Outcome Measure
Many internal hospital monitoring programs and external evaluations, such as Pennsylvania's MedisGroups initiative [19], sample patients by diagnosis-related groups. To parallel this approach, we selected all patients in the database who had been hospitalized for medical treatment of a new acute myocardial infarction defined by diagnosis-related groups. We chose acute myocardial infarction because it is a common condition, is treated at most hospitals, and has a relatively high mortality rate. We included patients in diagnosis-related groups 121 (circulatory disorders with acute myocardial infarction and cardiovascular complication, discharged alive), 122 (circulatory disorders with acute myocardial infarction without cardiovascular complication, discharged alive), and 123 (circulatory disorders with acute myocardial infarction, expired). Patients had either a principal or secondary 5-digit ICD-9-CM discharge diagnosis code beginning with “410” and ending with “1” (initial treatment).
Our outcome measure was in-hospital death. The MedisGroups data did not contain information on deaths after discharge.
Analysis
Each severity measure was used to calculate a predicted probability of death for each patient from a multivariable logistic regression model that included the severity score and dummy variables representing a cross-classification of patients by sex and eight age categories (18-44, 45-54, 55-64, 65-69, 70-74, 75-79, 80-84, and more than equals 85 years of age). Severity scores were entered as either continuous or categorical variables (Table 1). For Disease Staging and MedisGroups, we used the logit of the probability as the independent variable in the logistic regression. All analyses were done using the Statistical Analysis System, release 6.08 (SAS Institute, Cary, North Carolina).
Severity Measure Performance
We used c and R2 statistics as overall assessments of each severity measure's ability to predict individual patient death. The c statistic assesses this ability as follows: When a person who has died and a person who has lived are each chosen at random, c equals the probability that the severity measure predicts a higher likelihood of death for the one who has died [30]. Higher c values indicate better specificity and sensitivity [31, 32]. A c value of 0.5 indicates that the model does no better than random chance; a c value of 1.0 shows perfect performance. The R2 statistic is commonly interpreted as the percentage of variation in outcomes explained by the model. It is typically lower for models of dichotomous outcomes (such as death) than for models with continuous outcomes (such as length of stay). The R2 statistic adds independent information to that contributed by the c statistic for assessing how well predictions match actual outcomes [33].
Sometimes assessments of model performance are overly optimistic when the same data are used to both develop and evaluate models. To guard against this, we calculated cross-validated performance measures (c and R2) as follows [34]: 1) we randomly split the data in half: 2) we estimated coefficients for each model on the first half of the data and calculated “validated” performance measures by applying these coefficients to the second half; and 3) we repeated this process, developing the model on the second half of the data and validating it on the first half. Cross-validated c and R2 statistics represent the average of the two validated statistics calculated on the two halves of the data.
For each severity measure, we ranked patients according to their predicted probability of death on the basis of the multivariable model. We then divided patients into 10 groups of equal size (deciles 1 to 10) according to predicted likelihood of dying, and we compared actual and predicted death rates within each decile. These figures suggest how well models separated patients with very high and very low risks for death (model calibration). We also computed a Hosmer-Lemeshow chi-square statistic [35], which measures differences between actual and predicted numbers of deaths within the 10 deciles; goodness of fit is tested by comparing this statistic to a chi-square distribution with 8 degrees of freedom. Given our large sample size, even small differences between observed and expected numbers of deaths were statistically significant.
Ranking Patients by Predicted Probability of Death
We created 10 × 10 tables, arraying patients within deciles computed by one severity measure against patients within deciles computed by a second severity measure. The four severity measures thus yielded six 10 × 10 tables, one for each of the six pairwise comparisons. The average probabilities of death for patients within each decile indicated that a difference of three or more deciles constituted an important difference in the predicted likelihood of dying. For each 10 × 10 table, we counted the fractions of patients who had 1) “similar” predicted likelihoods of dying [probabilities of death calculated by severity measures A and B were within two deciles of each other]; and 2) “different” predicted likelihoods of dying (probabilities of death calculated by severity measures A and B were three or more deciles apart).
We separated patients with “different” predicted likelihoods of dying into two groups: those for whom the probabilities calculated by severity measure A were much higher than the probabilities calculated by severity measure B: and those for whom the probabilities calculated by severity measure A were much lower than the probabilities calculated by severity measure B. Conceptually, the former group represents patients viewed as “more sick” by measure A than by measure B, and the latter group represents patients viewed as “less sick” by measure A than by measure B.
Testing Clinical Validity
After finding that different severity measures resulted in very different rankings of the predicted likelihood of dying for the same patients, our next question was, Which severity measure correlated better with clinical findings thought to represent severe illness in patients with acute myocardial infarction? As a preliminary examination of this question, we reviewed the literature on predicting imminent death from myocardial infarction [36-43]. We selected six important clinical findings identified in the first 2 days of hospitalization: congestive heart failure, pulmonary edema, coma, low systolic blood pressure (≤ 60 mm Hg), low left ventricular ejection fraction (≤ 0.35), and elevated blood urea nitrogen level (≥ 31 mg/dL).
We examined each clinical finding individually for its relation to in-hospital death by creating 2 × 2 tables (finding present/absent by dead/alive) and calculating chi-square statistics. We also computed two logistic regression models: Both had dummy variables for each clinical finding, and one also included age and sex categories. We report odds ratios with 95% CIs for death for each clinical finding. Results are given ±SD.
We counted the percentage of patients with each clinical finding among persons with different predicted likelihoods of dying for each pair of severity measures.
Analyses of ICD-9-CM Complication Codes
The predictive ability of discharge-abstract-based measures could relate to the fact that these measures consider all discharge diagnosis codes, regardless of whether the conditions coded for were present at admission or developed subsequently. To examine whether discharge-abstract-based measures relied heavily on conditions developing after admission, we used ICD-9-CM codes to define serious conditions, including cardiac arrest, respiratory arrest, respiratory failure, and coma. We did analyses identical to those done with the six clinical findings (see above) using these conditions defined by ICD-9-CM codes.
Results
The final data set contained 11 880 patients and 1574 in-hospital deaths (13.2%). Patients ranged from 19 to 103 years of age (mean, 68.3 ± 13.3 years); 58.1% of patients were men. For Disease Staging and APR-DRGs, ample numbers of diagnosis codes were usually present for rating severity (mean, 5.6 ± 3.0 diagnosis codes per patient). Only 4.2% of patients had 1 discharge diagnosis code; 43.4% had more than 5 diagnoses listed; and 10.2% had 10 or more diagnoses listed.
Fifty-five of the 100 hospitals were in Pennsylvania, and 16 were from the southern United States. The 100 hospitals were generally larger, more likely to offer cardiac intensive care, more likely to be urban, less likely to be public, and more involved in teaching than other general acute care institutions nationwide [28].
Statistical Performance
The four severity measures varied in their statistical performance as measured by their c and R2 values (Table 2). Cross-validated performance was identical to or only 0.01 points lower than the performance of models developed using the entire data set. The two clinical data-based measures (MedisGroups and the physiology score) had almost identical c and R2 values. The two discharge-abstract-based measures (Disease Staging and APR-DRGs) had similar c statistics, which were slightly higher than those of the clinical data-based measures.
Table 3 shows the actual and predicted death rates for patients within each of the 10 deciles. All severity measures arrayed patients along wide ranges of predicted probabilities of death. Disease Staging had the broadest range: Patients in the lowest decile had a predicted death rate of 0.4% (actual death rate, 0.3%), and those in the highest decile had a predicted death rate of 59.7% (actual death rate, 58.4%). As measured by the Hosmer-Leme-show chi-square statistic, MedisGroups had the best calibration: The actual and predicted death rates within each of the 10 deciles were not significantly different (P = 0.317). However, given the large sample size, it was not surprising that the Hosmer-Lemeshow values for the other three measures were statistically significant, even when differences between the actual and predicted death rates within deciles appeared small.
Comparison of Patients with Different Predicted Likelihoods of Death
Table 4 shows the percentage of patients with similar and different predicted probabilities of death for pairs of severity measures and the percentage of patients who died within each group. The rankings of MedisGroups and the physiology score were generally similar; these measures assigned similar likelihoods of dying to 94.7% of patients. In contrast, the two discharge-abstract-based measures frequently disagreed with the clinical data-based measures about patient severity, ranking many patients differently according to likelihood of dying. For example, MedisGroups and Disease Staging ranked 22.3% of patients very differently.
Patients viewed as sicker by the discharge-abstract-based measures than by the clinical data-based measures had a higher death rate than did those viewed as less sick by the former measures than by the latter. For example, of the 11.0% of patients viewed as sicker by Disease Staging than by MedisGroups, 16.5% died; in contrast, of the 11.3% of patients viewed as sicker by MedisGroups than by Disease Staging, only 10.8% died.
Clinical Validity Analysis
Each of the six clinical findings had a strong individual relation (P < 0.0001) with in-hospital death (Table 5), which supports the validity of these findings as indicators of risk for death from acute myocardial infarction. The logistic regression that included age and sex plus the six clinical findings yielded a c statistic of 0.81 and an R2 value of 0.21. When controlling for other clinical findings (but not age and sex), the odds ratios for predicting in-hospital death were as follows: congestive heart failure, 1.66 (95% CI, 1.43 to 1.93); pulmonary edema, 1.22 (CI, 1.00 to 1.49); coma, 11.75 (CI, 9.52 to 14.51); low systolic blood pressure, 6.38 (CI, 5.16 to 7.91); low ejection fraction, 1.29 (CI, 1.04 to 1.60); and elevated blood urea nitrogen level, 3.58 (CI, 3.15 to 4.07).
Table 6 shows the percentage of patients with each clinical finding among patients for whom pairs of severity measures produced different likelihoods of dying. For example, 11.2% of patients viewed as sicker by MedisGroups than by Disease Staging had low systolic blood pressure at admission; in contrast, only 0.8% of patients seen as sicker by Disease Staging than by MedisGroups had low systolic blood pressure. Patients viewed as sicker by MedisGroups were more likely to have each of the six findings than were patients seen as sicker by Disease Staging.
In general, patients viewed as sicker by a clinical data-based measure than by a discharge-abstract-based measure were more likely to have each clinical finding than were patients for whom the opposite was true.
Analyses of ICD-9-CM Complication Codes
Most complications defined by ICD-9-CM codes occurred too rarely to allow rigorous statistical analysis. In one exception, 6.0% of patients had cardiac arrest codes; 60.4% of these patients died, compared with 10.2% of patients who did not have cardiac arrest codes (P < 0.001). Among patients viewed as sicker by Disease Staging than by MedisGroups, 16.2% had cardiac arrest codes; only 0.4% of patients seen as sicker by MedisGroups than by Disease Staging had cardiac arrest codes. The comparison of Disease Staging and the physiology score produced similar findings. These results, albeit preliminary, support the idea that codes such as that for cardiac arrest play an important role in discharge-abstract-based ratings of patient severity.
Discussion
Detailed evaluation of severity measures appears to be a narrow methodologic pursuit, far removed from daily medical practice. Nevertheless, severity-adjusted death rates are widely used as putative quality indicators in health care provider “report cards” [1-7]. Because severity measures could significantly affect their practices, physicians should assist—and possibly take the lead—in evaluating the validity of these measures. Examining the clinical credibility of severity measures demands extensive physician input. Physicians should ensure that the methods used to evaluate clinical performance are open to external scrutiny.
Our findings suggest, however, that interpreting such evaluations is complicated. The “take-home” messages of these evaluations may not be a definitive “This measure is good and that is bad.” Discharge-abstract-based severity measures (Disease Staging and APR-DRGs) were slightly better able to predict death (measured by c and R2 values) than the clinical data-based measures (MedisGroups and the physiology score). In contrast, MedisGroups and the physiology score had better clinical credibility (relations with six clinical indicators of risk for death from acute myocardial infarction) than Disease Staging and APR-DRGs. Thus, the discharge-abstract-based measures had better predictive validity, and the clinical data-based measures had better clinical validity.
Differences between severity measures that occur on the basis of their data sources have important health policy implications, suggesting a trade-off between data costs and clinical credibility. Because discharge-abstract data are routinely produced by hospitals, they are generally available, computer accessible, and inexpensive. These advantages have led some provider evaluation initiatives around the country (for example, in California, Connecticut, Florida, New Jersey, and Ohio) to use discharge abstracts. As stated earlier, Iowa switched from MedisGroups to APR-DRGs largely because of data costs. Because discharge abstracts often result from billing, however, some investigators have questioned whether financial motivations compromise data accuracy [44-46]. In addition, the clinical information contained in discharge abstracts is limited [47]. Nonetheless, our finding that the discharge-abstract-based measures were somewhat better able to predict death supports the choice of these measures.
However, the slightly better predictive ability of discharge-abstract-based measures may result from their inclusion of all diagnoses: These measures review all discharge diagnosis codes, including codes for cardiac arrest, respiratory arrest, ventricular fibrillation, and cardiogenic shock, regardless of when these events occurred. Consideration of the codes may also explain why patients viewed as sicker by the discharge-abstract-based measures than by the clinical data-based measures had higher death rates (Table 4).
Groups around the United States are drawing inferences about health care provider quality on the basis of death rates adjusted for severity using discharge-abstract data [1-4]. However, this raises obvious concerns: If quality is to be judged by using severity-adjusted death rates, adjustment should consider only preexisting conditions, not those that develop after hospitalization [48, 49]. Otherwise, events occurring late in the hospital stay (possibly as a result of poor care) may mask the detection of deaths due to poor quality. Thus, we focused our clinical analysis on findings from the first 2 days of hospitalization. If severity measures are used to judge quality, it may be reasonable to trade some predictive ability for greater clinical credibility.
Nevertheless, another trade-off remains: Abstracting clinical information from medical records is costlier than relying on existing discharge-abstract data. Given cost concerns, one notable finding is the similarity of the MedisGroups and physiology score results. Our physiology score was patterned after the acute physiology score of APACHE III, using only information from the clinical literature [22]. We included physiology scores, not to examine APACHE specifically, but because of growing interest in creating “minimum clinical data sets” containing small numbers of clinical variables. Although APACHE weights are one way to use such variables, other ways exist. The physiology score requires 17 clinical variables, whereas MedisGroups' data abstraction protocol examines more than 200 potential findings, regardless of patients' diagnoses.
Our study has important limitations. We looked at just one condition. The database contained information only from hospitals using MedisGroups; independent information about data reliability was unavailable. The clinical findings were specifically gathered for MedisGroups scoring, possibly giving MedisGroups an advantage in statistical performance and the clinical validity analysis. The MedisGroups algorithm for rating the severity of ischemic heart disease explicitly considers congestive heart failure, coma, low ejection fraction, low systolic blood pressure, and high blood urea nitrogen levels, among many other variables [13]. All measures are periodically revised; newer versions may provide different results.
The MedisGroups database contains information on only in-hospital deaths. Knowing mortality rates after discharge permits holding the “window of observation” constant (for example, at 30 days after admission). This is important when comparing patient mortality across health care providers with different discharge practices [50]. However, comparing death rates across hospitals was not our goal. We have no reason to expect that our overall finding—that different severity measures ranked many patients differently according to probability of death—would be different if we had looked at 30-day mortality.
Finally, our work is not a comprehensive comparative evaluation of severity systems; a complete study would require attention to additional issues. Commenting on the evaluation of quality measurement methods, Donabedian [51] suggested that “the concept of validity is itself made up of many parts [and] covers two large domains. The first has to do with the accuracy of the data and the precision of the measures that are constructed with these data. The second has to do with the justifiability of the inferences that are drawn from the data and the measurements.” Using this conceptual framework, a major remaining challenge is to examine whether judgments made on the basis of severity-adjusted death rates are justified. Does this information really offer insight into health care provider quality?
Our results suggest that mortality analyses require sensitivity to the severity adjustment measure used. Because different measures often rank the same patients at different severity levels, different hospitals or physicians may be viewed as having particularly good or bad risk-adjusted patient death rates, depending on the severity adjustment measure used. Our findings also raise concern about the use of severity scores (or predictions of imminent death) in making decisions about care for individual patients, because perceptions of the illness severity of individual patients may depend on the specific severity measure used.
Given the potential effects of severity measures on patients and health care providers, a formal process to evaluate them seems justified. Reason suggests that before a method is used to judge health care provider performance, it should be proven to measure quality. In the current health policy environment, however, rules of evidence and proof appear to be reversed. Because they are often the only measures available, severity-adjusted mortality rates will be used as indicators of health care provider quality until someone proves that they are not appropriate for this purpose. A definitive study is unlikely to be done anytime soon: Such research is expensive, and it poses the daunting challenge of defining “gold standard” quality measures. Nevertheless, both the public and health care providers need assurance that the information generated by using severity measures is valid.
Dr. Ash: Health Care Research Unit, Section of General Internal Medicine, Boston University Medical Center, 720 Harrison Avenue, Room 1108, Boston, MA 02118.
Dr. Shwartz: Health Care Management Program and Operations, Management Department, School of Management, Boston University, 621 Commonwealth Avenue, Boston, MA 02215.
Dr. Daley: Health Services Research and Development, Department of Medicine, Brockton/West Roxbury Veterans Affairs Medical Center, 1400 VFW Parkway, West Roxbury, MA 02132.
Dr. Hughes: Department of Medicine, Department of Veterans Affairs Medical Center, 950 West Campbell Avenue, West Haven, CT 06516.
- Copyright ©2004 by the American College of Physicians
RSS Feeds









