When To Base Clinical Policies on Observational Versus Randomized Trial Data

  1. John Hornberger, MD, MS; and
  2. Elizabeth Wrone, MD
  1. From the Stanford University School of Medicine, Stanford, California. Note: This article is one of a series of articles comprising an Annals of Internal Medicine supplement entitled “Measuring Quality, Outcomes, and Cost of Care Using Large Databases: The Sixth Regenstrief Conference.” To see a complete list of the articles included in this supplement, please view its Table of Contents. Acknowledgments: The authors thank Siu Hui, Linda McCann, Philip Lavori, Mary Anne Rodgers, and two anonymous referees for their comments on the manuscript. Requests for Reprints: John Hornberger, MD, MS, Department of Health Research and Policy, Stanford University School of Medicine, HAP Redwood Building, Room T254A, Stanford, CA 94305-5092. Current Author Addresses: Dr. Hornberger: Department of Health Research and Policy, Stanford University School of Medicine, HAP Redwood Building, Room T254A, Stanford, CA 94305-5092. Dr. Wrone: Department of Medicine, Stanford University School of Medicine, HAP Redwood Building, Room T254B, Stanford, CA 94305-5092.

    Abstract

    Physicians must decide when the evidence is sufficient to adopt a new clinical policy. Analysis of large clinical and administrative databases is becoming an important source of evidence for changing clinical policies. Because such analysis cannot control for the effects of all potential confounding variables, physicians risk drawing the wrong conclusion about the cause-and-effect relation between a change in clinical policy and outcomes. Randomized studies offer protection against drawing a conclusion that would lead to adoption of an inferior policy. However, a randomized study may be difficult to justify because of the extra costs of collecting data for a randomized study and concerns that a study will not directly benefit the patients enrolled in the study. This article reviews the advantages and disadvantages of basing clinical policy on analysis of large databases compared with conducting a randomized study. A technique is described and illustrated for accessing the potential costs and benefits of conducting such a study. This type of analysis formed the basis for a physician-managed health care organization deciding to sponsor a randomized study among patients with end-stage renal disease as part of a quality-improvement initiative.

    Physicians and health care organizations face the substantial challenge of maintaining or improving health outcomes for patients while containing health care expenditures. Analysis of large databases is an efficient method for discovering new clinical policies that may help to achieve these goals [1]. As advances in information technologies continue to facilitate such analysis, physicians must decide whether to accept a new policy only on the basis of these analyses or wait for completion of at least one randomized trial. Research shows that physicians are more likely to accept a new policy when evidence stems from a randomized study [2]. However, physicians may choose not to conduct or have their patients participate in a randomized study for two primary reasons: They believe the patients would not directly benefit from a study, or resources to fund the study would be better spent on assuring provision of established therapies. In addition, other physicians or organizations might perform the needed studies. This article helps physicians and health care organizations decide whether to change a clinical policy on the basis of analysis of large databases compared with conducting or waiting for the results of a randomized trial.

    We first discuss the advantages and disadvantages of undertaking a randomized study, using digoxin for patients with heart failure and β-carotene for patients with cancer as specific examples. We then describe a technique for estimating the potential costs and health consequences of having patients participate in a randomized trial. This technique has been used to help a large physician-managed organization decide whether to conduct a randomized study of folic acid supplementation to prevent cardiovascular disease among patients with end-stage renal disease.

    Basing Clinical Policies on Analysis of Large Databases or Waiting for Completion of Randomized Studies

    In the 1980s, analysis of several clinical databases showed that some patients who received digoxin for heart failure after myocardial infarction had a higher risk for death [3-5]. However, several questions needed to be resolved before discontinuation of digoxin therapy could be recommended [6]. Researchers asked what biological mechanisms could explain the link between digoxin and the increased risk for death. They also asked whether other factors, such as comorbid conditions, could explain the association between digoxin and the increased risk for death.

    Basic science research plays a pivotal role in answering the first question of identifying the mechanisms of a drug's effect on clinical outcomes. Powerful epidemiologic tools have been developed to address the second question by controlling for the possible influence of measurable comorbid conditions. However, such methods cannot control for the effects of unmeasured factors [7, 8]. The randomized trial has become an important tool for evaluating treatments because any subsequent difference in outcomes found between patient groups are more likely to be the direct consequence of treatment assignment [9, 10]. As a result, physicians might appropriately ask whether it would be prudent to wait for evidence from randomized studies before changing their policy. In the case of digoxin, several trials were completed and showed, contrary to concerns raised by observational studies, that digoxin was beneficial in selected patients [11-13].

    The case of β-carotene is also instructive. More than 20 basic science and epidemiologic studies provide evidence that β-carotene may decrease the risk for cancer [14, 15]. Two of three randomized trials found, however, that patients receiving β-carotene from nondietary sources had an increased risk for lung cancer and death [16-18]. These examples, in which findings of randomized studies differed significantly from predictions of epidemiologic and basic science studies, have led experts to caution against relying only on analyses of observational data to determine the direction of treatment effects [7, 8, 14, 19-23].

    Despite the value of randomization, it is estimated that less than 20% of clinical policies are based on randomized studies [22]. Several factors may create a barrier to conducting a randomized study. First, the findings of observational studies may seem so compelling that randomization would be considered unethical [24]. Second, conducting a randomized study requires a commitment in time and money, the return on which may not be evident for many years. Third, physicians may believe that they and their patients will benefit more by waiting for the study to be done elsewhere. In the next section, we illustrate these issues by analyzing the potential costs and health consequences of a randomized study of folic acid supplementation to prevent cardiovascular disease among patients with end-stage renal disease. The analyses presented here were used by the medical directors of Satellite Dialysis Centers in Redwood City, California, in deciding whether to conduct a randomized study as part of a new quality-improvement initiative.

    Estimating the Cost-Benefit of Conducting a Randomized Study of Folic Acid Supplementation in Patients with End-Stage Renal Disease

    Basic science and epidemiologic studies have shown a correlation between increased levels of homocysteine in blood and an increased risk for cardiovascular disease [25-27]. Plasma homocysteine levels increase with the onset of chronic renal failure and remain elevated in more than 80% of patients receiving long-term dialysis [26]. Cardiovascular disease is the leading cause of death in patients with end-stage renal disease [28], and hyperhomocysteinemia is more prevalent in these patients than any other cardiovascular risk factor [27].

    In patients without renal disease, homocysteine levels may return to normal with as little as 2.5 mg of oral folic acid per day [29]. However, homocysteine levels are difficult to normalize in patients with end-stage renal disease, even at supraphysiologic doses [30, 31]. Moreover, the costs and risks of administering high doses of folic acid to patients with end-stage renal disease are unknown. Physicians might want additional information about the consequences of folic acid supplementation before recommending a change from the current clinical policy.

    Analysis of Costs and Health Consequences

    How can the potential costs and benefits of a randomized study of a new treatment, such as prescribing higher doses of folic acid (for example, 5 mg/d) to decrease homocysteine levels in patients with end-stage renal disease, be estimated? First, a health economic model (cost-effectiveness or cost–benefit) is needed to estimate the expected costs and health consequences of high doses of folic acid compared with the standard dose (1 mg/d) [32]. Such models have been used extensively to compare the consequences of medical treatments [32-34]. Model inputs include established concerns of modern technology assessments, such as treatment effects on life expectancy, quality of life, and costs of medical care [35]. Expert panels recommend that the combined effects of treatment on illness and death be summarized in these models in terms of quality-adjusted life expectancy [33].

    We use a health economic model to determine whether a trial should be done and how many patients should be enrolled to prevent recommending the wrong treatment once the trial is completed [36]. Regardless of the type of health economic model used, physicians must decide how much is a reasonable investment in health programs, including the cost of research, to prolong quality-adjusted life expectancy by 1 year [33]. For example, choosing a low valuation (investing no more than $20 000 to prolong life by 1 year) [37] is likely to undervalue the research effort in the context of many currently accepted medical interventions, such as dialysis in patients with end-stage renal disease [38]. In contrast, if the valuation exceeds $200 000 per year of prolonged life, then resources may be spent on some types of research that could have been used for established interventions that would have yielded better health outcomes or had lower costs. Because no consensus exists on the optimal investment in treatments or research for improving quality-adjusted life expectancy [33], the effects of different valuations of health on trial designs are explored with sensitivity analyses.

    Using a cost–benefit method, we constructed a health economic model to compare the effects of two different doses of oral folic acid among patients with end-stage renal disease undergoing dialysis. Such cost–benefit analyses usually begin with a reference case, in which each input is assigned an initial average value. Details about the structure of the cost–benefit model, inputs and assumptions, reference case results, and sensitivity analyses are available from the authors or on the World Wide Web (http://www-leland.stanford.edu/~ewrone). The model also includes consideration of the size of the target population (that is, the number of patients that physicians expect to benefit from the results of the trial) and the rate at which patients are expected to withdraw from the study. Sensitivity analyses are used to explore how conclusions from the reference case analysis change with reasonable variation from the initial input values.

    The top panel of Figure 1 shows a sensitivity analysis in which the cost benefit of using high-dose folic acid instead of standard-dose folic acid is plotted as a function of the relative risk (RR) for death between the two treatments. The RR equals the probability of death associated with high-dose folic acid divided by the probability of death associated with standard-dose folic acid. If the RR is low, then the cost–benefit difference is negative and high-dose folic acid is preferred. If the RR is high, then the cost–benefit difference is positive and standard-dose folic acid is preferred.

    Figure 1. The expected cost–benefit difference per patient if high-dose folic acid is recommended instead of standard-dose folic acid for different values of the relative risk for death of the two treatments. The loss per patient through recommending standard-dose folic acid. The loss per patient through recommending high-dose folic acid.
    View larger version:
    Figure 1. The expected cost–benefit difference per patient if high-dose folic acid is recommended instead of standard-dose folic acid for different values of the relative risk for death of the two treatments. The loss per patient through recommending standard-dose folic acid. The loss per patient through recommending high-dose folic acid. Expected cost–benefit difference and loss. Top.Middle.Bottom.

    An RR of 0 represents the minimum treatment effect, at which physicians should be indifferent (on a cost–benefit basis) between recommending either treatment; in the reference case, the minimum RR equals 0.94. This minimum depends directly on the assumptions in the model. For example, if the cost of high-dose folic acid is much greater than that of standard-dose folic acid, then the reduction in RR effected by high-dose folic acid must also be much greater.

    Consequences of Choosing the Wrong Treatment

    The cost–benefit analysis shows the extent to which the RR is an important input for treatment decisions. Nevertheless, much uncertainty remains about whether the RR is more or less than the minimum treatment effect. Physicians could decide, on the basis of available data and expert opinion, that the RR is less than 0.94 and recommend high-dose folic acid; conversely, physicians could decide that the RR is more than 0.94 and recommend standard-dose folic acid. A trial might be done elsewhere, and the published findings might alter the clinical policies later. Alternatively, the uncertainty about the RR may be so great that physicians decide to conduct the study themselves to avoid choosing the wrong treatment. The consequence of drawing the wrong conclusion and recommending the inferior treatment is called loss [39, 40]. For example, no loss is associated with using standard-dose folic acid when it is the preferred treatment: in this instance, if the RR is 0.94 or higher (Figure 1, middle). If the RR is less than 0.94, then a loss is associated with continuing to use standard-dose folic acid; in the context of cost–benefit analyses, this loss is just equal to the absolute value of the cost–benefit difference between the two treatments. Conversely, no loss is associated with using high-dose folic acid when it is the preferred treatment (that is, when the RR is less than 0.94) (Figure 1, bottom). When the RR is 0.94 or higher, however, a loss that is equal to the value of the cost–benefit difference between the two treatments is associated with switching to high-dose folic acid.

    To determine the loss if the wrong treatment is recommended, this analysis requires experts to assign an estimated value to the RR before the trial is started [41, 42]. This estimate, called the prior distribution, is usually stated in terms of a probability distribution, such as the mean and SD of the variable of interest [39, 40, 43, 44]. When clinical trial data are still unavailable, expressing an opinion about the possible levels of the RR may be challenging, but various approaches have been implemented successfully [24]. For example, suppose experts believe that increasing folic acid supplementation has no established benefit or harm. This belief might be summarized by setting the mean of the RR as equal to 1. What is the level of uncertainty in this estimate? Experts could initially report what they believe to be the extreme lower and upper levels of the RR. Further inquiries might ensue to gain a better understanding of the SD of the prior distribution (for example, “What are the chances that the true RR exceeds 0.70?”) [45]. After sufficient inquiry, the experts' beliefs might be represented graphically (Figure 2).

    Figure 2. The prior distribution has a β distribution. These analyses assume that the estimated relative risk of the trial is 0.70. With larger sample sizes (for example, 50 and 200 patients per group), the mean of the posterior distribution shifts closer to 0.70 and the SD shrinks.
    View larger version:
    Figure 2. The prior distribution has a β distribution. These analyses assume that the estimated relative risk of the trial is 0.70. With larger sample sizes (for example, 50 and 200 patients per group), the mean of the posterior distribution shifts closer to 0.70 and the SD shrinks. Plot of the probability distribution of the relative risk for death for the prior and posterior distributions with different sample sizes.

    In this example, considerable risk for recommending the wrong treatment remains. For example, if standard-dose folic acid is chosen, more than a 25% probability exists that the true RR is less than 0.70, in which case high-dose folic acid should have been chosen.

    The expected loss for recommending the wrong treatment, called the prior expected loss, can be computed by averaging treatment loss over all possible RRs, weighted by the probability of the RRs. Because expert opinions often vary, especially when little experimental evidence exists and there may be strong incentives for the evidence to reveal a particular relation, the effect of these different opinions on trial design can be assessed in sensitivity analyses [33].

    The value of conducting more research, when viewed from this cost–benefit perspective, is to obtain a more precise estimate of RR and reduce the expected loss associated with recommending the wrong treatment. Once the trial data are analyzed, the belief about RR is summarized in light of new data from the trial, which is called the posterior distribution. This distribution is computed by applying Bayes theorem to the trial data and the prior distribution [39-4143, 44]. As the trial sample size increases, the mean of the posterior distribution tends toward the true RR and the SD of the posterior distribution shrinks. The posterior expected loss for a treatment is computed by averaging treatment loss over all possible RRs, weighted according to the posterior distribution.

    Because the uncertainty about the RR shrinks with increasing trial sample size, so does the posterior expected loss. The value of obtaining more trial data to reduce the risk for choosing the wrong treatment must be balanced, however, against the added costs of enrolling more trial participants and collecting and analyzing more outcome data. The analyses also incorporate the consequences of a trial being conducted elsewhere. Deciding not to conduct the trial avoids research costs but also may delay for several years results that could help determine optimal treatment.

    Summary of Key Inputs and Assumptions and Results

    We used the cost–benefit model and prior distribution described here to analyze the decision facing the medical directors of Satellite Dialysis Centers between adopting high-dose folic acid on the basis of current evidence or conducting a randomized study. The valuation of health in the reference case was set as equal to $70 000 per quality-adjusted year of life saved [37]. The size of the target population of patients being treated at Satellite Dialysis Centers is estimated to be 6500. At the time of the analysis, no indication existed that a trial to assess clinical outcomes or costs was being planned or started in this patient population; we predicted that a trial with sufficient power to resolve the clinical uncertainty would not be completed by another organization for at least 5 years. The research cost per patient in the trial for additional personnel and testing was expected to be $2000 [42, 46].

    Under these conditions, the loss associated with choosing the wrong treatment would be minimized if a trial is done and if it includes 105 patients enrolled in both groups (Table 1). The expected loss of such a trial is $1 659 300, and research costs are expected to be $420 000. The net value for performing the trial instead of waiting is $621 200. Most of this value results from the organization obtaining enough experimental data by 3 years, instead of 5 years, to recommend an optimal treatment. Table 1 also shows the losses in terms of medical care costs and quality-adjusted life years saved.

    Table 1. Evaluating Whether To Conduct a Trial or Wait for a Trial To Be Completed Elsewhere

    What would be the appropriate number of patients to enroll in the study if the design included a classical hypothesis-testing method for calculating sample size? At a significance level of 0.05 and power of 0.8, a trial designed to detect an RR less than 0.94 would require 3615 patients per group. If the goal was to detect an RR of 0.75 or less, then the number of patients needed per group would be 228 and the expected loss would be greater than $3 million.

    How do the experts' previous beliefs affect the decision to conduct the trial? Collecting information in a trial continues to be cost-beneficial if the belief is that the mean RR lies between 0.66 and 1.07 (Figure 3, top). If the belief is that the mean RR is low, then it is more cost-beneficial to recommend high-dose folic acid. If little uncertainty exists about the level of RR (for example, the SD is 0.16 or lower) (Figure 3, bottom), then the research cost of the trial substantially exceeds the value of the information gleaned from the trial; thus, the trial should not be done. We also found that if the valuation of health was increased to $200 000 per quality-adjusted life year saved, the number of patients per study group should increase to 248.

    Figure 3. Net value for conducting a trial compared with waiting until another organization publishes its trial results, as a function of the prior distribution. Net savings at different means of the prior distribution where SD remained equal to 0.2. Net savings at different SDs of the prior distribution where the mean remained equal to 1.
    View larger version:
    Figure 3. Net value for conducting a trial compared with waiting until another organization publishes its trial results, as a function of the prior distribution. Net savings at different means of the prior distribution where SD remained equal to 0.2. Net savings at different SDs of the prior distribution where the mean remained equal to 1. Sensitivity analyses.Top.Bottom.

    Discussion

    We described a method for estimating the potential health consequences to patients and costs to physicians associated with conducting a randomized trial. This method relies on established techniques of health economic analysis in which the risk for recommending the wrong treatment on the basis of analyses of observational data is weighed against the value of information that could be obtained from a trial [36, 42, 44, 46-53]. In the example of folic acid supplementation, the risk associated with recommending the wrong treatment would be minimized if a trial were done with 105 patients per study group, instead of waiting for reports from other sources. The net value of such a trial is expected to exceed $600 000.

    Sensitivity analyses also show that conducting a trial designed to detect at least a 25% reduction in the RR according to classic sample-size methods would require as many as 228 patients per treatment group. The smaller sample size computed with the cost–benefit method yielded good results, in part because we analyzed the problem from the perspective of deciding how to efficiently spend resources to improve health outcomes. These analyses account for realistic concerns, such as the research costs per patient, the expected costs and benefits of making the wrong conclusion once the study is completed, expectations about other organizations doing a similar trial, and the number of patients seen by these physicians. Our study highlights, in explicit and quantitative terms, why physicians or health care organizations may be ambivalent about conducting a randomized study planned according to traditional statistical methods.

    An important value of classically designed trials is that they do not have to rely on the subjective judgments of prior probabilities used in Bayesian analyses [39, 40]. We expect classic statistics to continue as the dominant framework for designing and analyzing clinical trials and reporting results. Physician groups and related health care organizations may decide to undertake Bayesian-designed trials to more efficiently improve outcomes for their patients, but such trials may not have sufficient power to influence practice elsewhere. Alternatively, to achieve a greater societal benefit, organizations might divide into research networks to share the expense of conducting trials powered according to classic computations [54].

    Deciding whether conditions favor doing a clinical trial is not a trivial task, and obtaining the judgments of experts, constituted in review committees, is desirable. Both scientific and human subject committees help ensure that a proposed trial is ethically designed and will benefit the target population. Like other types of decision support [55], such methods may help managed care groups to organize their decision processes and derive maximum benefit, within budget, from quality-improvement programs that include randomized studies.

    Our method has a few limitations. Treatment effects were modeled with a cost–benefit method instead of a cost-effectiveness method [32], although both techniques should yield similar results. We also omitted other potential values of randomized trials. For example, a randomized trial requires a protocol. In addition to ensuring that important data are collected [8, 20, 56], written protocols for purposes other than conducting a randomized trial, called critical pathways [57], have emerged as a new initiative that promises to reduce costs and improve outcomes. Physician groups and health care organizations also might derive value from participation in randomized trials to provide evidence of their leadership in establishing clinical standards and showing their willingness to participate in research that serves the public interest.

    Conclusion

    Advances in information systems permit the collection and storage of a large amount of observational clinical and administrative data. The relative ease and low cost with which such data can be analyzed offers an attractive alternative to randomized trials at a time when pressures are mounting to quickly identify clinical policies that promise to maintain or improve health outcomes while containing the growth of health expenditures. However, medical history has repeatedly shown the risks of relying too heavily on only the analyses of observational data. Physicians may find the techniques described here useful for deciding between adopting a new clinical policy on the basis of analysis of observational data stored in large databases and conducting a randomized study.

    References

    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.
    6. 6.
    7. 7.
    8. 8.
    9. 9.
    10. 10.
    11. 11.
    12. 12.
    13. 13.
    14. 14.
    15. 15.
    16. 16.
    17. 17.
    18. 18.
    19. 19.
    20. 20.
    21. 21.
    22. 22.
    23. 23.
    24. 24.
    25. 25.
    26. 26.
    27. 27.
    28. 28.
    29. 29.
    30. 30.
    31. 31.
    32. 32.
    33. 33.
    34. 34.
    35. 35.
    36. 36.
    37. 37.
    38. 38.
    39. 39.
    40. 40.
    41. 41.
    42. 42.
    43. 43.
    44. 44.
    45. 45.
    46. 46.
    47. 47.
    48. 48.
    49. 49.
    50. 50.
    51. 51.
    52. 52.
    53. 53.
    54. 54.
    55. 55.
    56. 56.
    57. 57.
    « Previous | Next Article »Table of Contents