Clinical Judgment Revisited: The Distraction of Quantitative Models

  1. Alvan R. Feinstein, MD
  1. From Yale University School of Medicine, New Haven, Connecticut. Requests for Reprints: Alvan R. Feinstein, MD, Yale University School of Medicine, 333 Cedar Street, P.O. Box 208025, New Haven, CT 06520-8025.

    Abstract

    More than 25 years ago, in a book called Clinical Judgment, each act of patient care was described as having an experimental structure.The “experiments” needed substantial scientific improvement, however, in quality of basic data, taxonomic classification of phenomena, and specifications of clinical reasoning. During the past 2 decades, these improvements have not occurred as extensively as expected because many investigators working in clinical forms of clinical research have not addressed these basic scientific challenges in data, taxonomy, and reasoning. Instead, the investigators have applied quantitative “models,” derived from non-clinical domains, that focus on hard data, randomized trials, Bayes theorem, quantitative decision analysis, and psychometric strategies for clinimetric measurement. Consequently, the main challenges of clinical judgment still remain generally available for basic scientific research by investigative clinicians.

    More than 25 years have passed since I wrote Clinical Judgment[1], a book that was expanded from four articles called “Scientific Methodology in Clinical Medicine” [2-4]. The book was enthusiastically received and seems to have had memorable effects on its readers. In various travels, I often meet clinicians who describe their admiration for it and ask me to autograph their aging copy. In one recent meeting, the clinician ended his laudatory comments by asking a provocative question. “I felt sure,” he said, “that the book would inaugurate a wonderful new era of profoundly clinical investigation. In the past 2 decades, however, most of the research devoted to patient care has been more mathematical than clinical. Why do you think the new clinical era has not happened?”

    That question made me reflect on events of the past 25 years. I re-read the book, thought about its ideas and proposals, considered what had happened during the interval, and compared the disparities between the observed and the expected. This essay offers my speculation about the reason for those disparities and an attempt to reconcile the differences.

    Original Proposals

    In the original four methodology papers and in the clinical judgment book, I indicated that acts of patient care were analogous to experiments. Each treated patient begins in a baseline state, receives an intervention, and has an outcome—exactly as in an experiment. Unlike conventional laboratory experiments, however, most clinical activities have neither a concurrent control group nor an innovative goal. The “control” comparison comes from the clinician's awareness of similar patients in the past; and the goal of the “experiment” is to repeat (or exceed) the best outcomes achieved with those previous patients.

    A prime argument in the book was that the “experiments” of patient care, despite an intellectual framework analogous to laboratory experiments, did not receive the same scientific attention or respect. Lacking any formally identified methods of design and evaluation, most clinicians would use the term “clinical judgment” when pressed to specify the methods, reasoning, and data used in clinical decisions. The book described the scientific structure of those daily clinical experiments and proposed methods that could substantially improve both the art and science of the work. The challenges offered intellectual excitement for clinicians who wanted to do humanistic scientific research in the care of patients. Yet the challenges—at least in the form I envisioned them—have not been generally pursued. Why not?

    Potential Obstacles

    In the book, I had anticipated two possibly diverting obstacles. One was ideologic. During the 20th century, particularly since World War II, the acts of prediction and intervention that constitute patient care have not been regarded as basic scientific challenges. The general belief has been that explication of pathophysiologic and biological mechanisms is the only important, fundamental scientific work for clinical academicians. Although changing nature, rather than explaining how nature works, has traditionally been the most important scientific task of clinicians, the basic ideas and predictive principles of patient-care interventions have been downgraded and regarded as a secondary, “applied” activity, unworthy of first-rate minds. As this constraining scientific ideology received massive financial support, many clinical investigators would be attracted to laboratory research, where they would try to answer basic questions about pathophysiologic and (eventually) molecular mechanisms of disease, not basic questions about prognosis and therapy for patients. Bright, knowledgeable clinicians wanting to do clinical research would therefore be diverted from studying patient care, which they knew best and for which they were the most suitable investigators. Instead, they would study topics in explicatory “basic science” that could often be explored equally well and often better by researchers with PhDs who have no clinical skills.

    A second potential obstacle was the allure of computers, which had then newly arrived on the medical scene. They offered the intellectual appeal of a fascinating new “toy” for automating many of the data-processing activities of hospitals and clinicians, as well as new intriguing possibilities, such as direct history-taking from patients and the creative imageries of “artificial intelligence”. I feared that the enormous intellectual fun of getting the computer to do these things would divert clinicians from the more fundamental scientific challenges in data, classification, and reasoning.

    During the past 25 years, both types of diversion have indeed occurred. As shown by the papers presented at annual meetings of societies devoted to clinical research, the investigators have become more intensely oriented than ever before to a relatively pure form of explicatory “basic” molecular biology. As evidenced by the personal computers, terminals, and printouts that are now ubiquitous in hospitals and clinicians' offices, the computer has become a well accepted and moderately well integrated component of modern medical technology. The “basic” research and computer activities have brought magnificent advances to modern medical science. They include our new knowledge of genes, enzymes, cells, receptors, and membranes; our powerful new therapeutic armamentarium in drugs, surgery, and biologic products; an entirely new industry in biotechnology; and new forms of computer-aided morphologic imaging.

    Growth of “Clinical” Forms of Clinical Research

    Despite the many investigators who have done basic science and computer activities, however, clinical forms of clinical research have managed to survive and grow. Originally funded by private foundations, but now aided by federal sources, new programs were developed to support generalists and other clinicians working on clinical issues in patient care. The work has now formed a sizeable corpus of ideas and publications in fields that are often called clinical epidemiology, health services research, medical decision making, and outcomes research. The rest of this essay is concerned with what has been done, and not done, in those clinical investigative activities.

    In the clinical judgment book (and in subsequent writings), I had pointed out that the primary scientific challenges for clinicians were methodologic. “Basic” scientific research produced the new technologic agents that could be used in patient care, but the accomplishments of those agents could not be planned or evaluated with the laboratory methods used in “basic” explicatory science. Because intact human beings differ drastically from the material studied in laboratory research, they required a drastically different set of investigative procedures, which could not be extrapolated or derived from laboratory methods. A new and entirely different kind of basic scientific methodology was needed [5-9].

    The existing clinical methods, as described in the book, were in an intellectually impoverished state. The patients who were the investigated “clinical material” were scientifically identified with a grossly inadequate system of classification. It relied mainly on diagnostic names of diseases, laboratory information, and demographic data that ignored all the clinical distinctions of a patient's illness. The customary clinical taxonomy did not include patterns of symptoms, severity of illness, effects of comorbid conditions, timing of phenomena, rate of progression of illness, functional capacity, and other clinical distinctions that demarcate major prognostic and therapeutic differences among groups of patients who otherwise seem deceptively similar because they have the same diagnosis, laboratory results, and demographic status.

    The science used to acquire the basic clinical data was also grossly inadequate. It depended on generally untested and underdeveloped methods of history taking, physical examination, and acts of clinical designation and inference. These methods were usually disdained as subjective and “soft” and were seldom given the same rigorous intellectual attention used for disciplined scientific work. With no efforts made to improve or “harden” the basic informational process, the “softness” of the data was assured. If reproached for the neglect, clinical investigators would claim that patients were “unreliable” in giving histories, and besides, the development of newer, better laboratory measurements might make the clinical data become unnecessary or obsolete.

    Clinical Judgment urged clinicians to develop a “basic science” of their own—to study clinical phenomena directly, to specify the importance of different types of clinical data, to improve the scientific quality of the data, to identify (or create) appropriate systems of taxonomy for classifying the information, and to develop intellectual models and pragmatic methods that would articulate the clinical process, recapitulate it, and use the results for quantified analyses.

    The basic proposals were all aimed at the fundamental scientific challenges in data and classification. After these challenges were solved with suitable qualitative methods, the results could be quantified thereafter. Boolean algebra, Venn diagrams, and algorithmic flow charts [10] could be used to identify the patterns of logic; rating scales and categories that described complex clinical phenomena (such as severity of symptoms) could be given ranked numerical digits; and the patients in the designated categories could be counted for statistical analyses. All the basic issues, however, were qualitative: The new procedures would emerge from descriptive solutions to the basic problems in clinical data and clinical taxonomy.

    The Allure of Quantitative Models

    What I had not anticipated was that the investigative clinicians would be distracted by yet another form of scientific ideology: the allure of applying quantitative models. The “models” can be general paradigmatic concepts (such as the value of “hard data” and randomized trials) or specific mathematical formulations (such as Bayes theorem), but all of the methods involve quantitative approaches, and all of them have been derived from other domains—statistics and psychosocial sciences. Instead of developing their own basic scientific methods [11-13], clinicians have often taken the nonclinical methods and applied them to the existing clinical challenges.

    The rest of this essay summarizes the nonclinical models, their inadequacies for solving inherently clinical problems, and the basic scientific challenges that still remain. The models to be discussed include “hard” data, randomized trials, Bayes theorem, decision analysis, psychometric strategies, and several other procedures.

    “Hard” Data

    Although seldom defined, the term “hard data” usually refers to objective measurements, preferably done by a machine, that are expressed in standardized dimensional scales for length, weight, volume, or time. Other acceptable “hard” data come from unequivocal events—such as birth and death—that can easily be determined and counted. This type of hard information has the scientific advantage of being trustworthy. It also has the statistical advantage of being mathematically “tractable”: The measured or counted results can easily be added, multiplied, or subjected to other operations that summarize the data as means or simple binary proportions.

    The hard-data goal has made investigators prefer to use technologic tests rather than clinical observations to describe clinical phenomena. The technologic procedures were often better and more accurate than the previous clinical methods. For example, a chemical appraisal of urinary sugar was obviously superior to the 18th century method of tasting urine with which doctors diagnosed diabetes mellitus, and an image of the lung or a biopsy of the liver offered better evidence of morbid anatomy than inferences derived from history, physical examination, and laboratory tests.

    On the other hand, the technologic procedures were sometimes tangential to the desired target and often inapplicable. Despite the excellence of the quantitative expressions, the forced expiratory volume of respiration does not indicate a patient's dyspnea, and a depressed S-T segment does not indicate angina pectoris in daily life. Besides, many of the most important clinical events are intrinsically human reactions and sensations—pain, discomfort, disability, general functional capacity, depression, anxiety, and gratification—that cannot be measured with any technologic test.

    The human clinical phenomena could be easily “measured” with semi-quantitative ratings on ordinal scales such as none, mild, moderate, and severe, or with quasi-dimensional marks on a visual analog line. These rating scales, however, have often been ignored because they are scientifically and statistically unappealing. Scientifically, the observations are subjective, and the results are often inconsistent and non-reproducible, relying on unstandardized “global” ratings that lack specific criteria for demarcation. Statistically, the semi-quantitative ratings could not be suitably summarized with the customary arithmetic that produced means and binary proportions.

    The use of “soft data” and semi-quantitative rating scales became necessary when new pharmaceutical agents were developed to treat such phenomena as pain, insomnia, anxiety, depression, and the impaired mobility of Parkinson disease. In the randomized trials that tested these agents for efficacy, however, ordinal ratings and visual analog scales became acceptable not because they were produced by standardized methods and demarcated criteria, but because the subjective observations were “double-blind.” Even if individual data were themselves scientifically imprecise and unspecified, the information was accepted as statistically “unbiased” as long as the observational process was double-blind. The new types of rating scales also became statistically acceptable because “nonparametric” strategies had been developed to manage the semi-quantitative rankings. The consequence of the new approach was that soft data receive serious analytic attention when produced in the special double-blind environment of a randomized trial. For nonrandomized research, however, the data continued to be scientifically defective, with little or nothing done to improve the quality of the basic observations and classifications.

    The double-blind “hard-data” principle thus had the advantage of offering scientifically acceptable evidence in randomized trials of new pharmaceutical agents. The principle had the disadvantage, however, of letting the investigators escape or continue avoiding the basic scientific challenges of carefully identifying important clinical and human phenomena.

    Randomized Trials

    Originally proposed and applied to test the average efficacy of agricultural interventions, randomized trials had the scientific virtue of removing subjective judgment that might produce susceptibility bias when “treatments” were allocated to different blocks of soil. The trials also had the statistical virtue of allowing the results to be interpreted with stochastic theories of probability. When transferred from agriculture to clinical medicine, randomized trials had directly analogous goals in testing the average efficacy of therapeutic interventions.

    The cogent clinical distinctions of different patients with the same disease would not require careful classification for either baseline states or outcomes. The baseline differences would presumably be equitably distributed by the randomization, and the investigated outcome events would consist of either “hard” evidence (such as death) or “soft” evidence whose softness would be mitigated by the double-blinding that presumably removed subjective bias.

    Randomized trials have been spectacularly successful in solving many of the particular problems to which they were addressed. For the first time in the history of medicine, clinicians today can prescribe pharmaceutical and many other therapeutic agents whose average efficacy has been unequivocally shown. The trials have not been particularly successful, however, for the customary scientific challenges of practicing clinicians. A clinician wants to know the best diagnostic “work-up” for a particular patient's presenting manifestations, the best treatment for the clinical distinctions of individual patients, and the spectrum of outcomes that can be anticipated for those patients.

    The trials have not been generally applicable for evaluating diagnostic procedures, and the therapeutic results are deliberately aimed at showing average efficacy in a diseased group rather than optimum management for individual patients. The application of randomized trials has brought (and will continue to bring) splendid progress in the science of evaluating average therapeutic efficacy, but the basic statistical strategies are not designed or intended to address the basic scientific challenges in clinical taxonomy and data. Randomization is not a scientific method; it is an invaluable statistical strategy for the mathematical exploitation of uncertainty.

    The great appeal of randomized trials was that investigative clinicians could do scientifically credible research without having to discern crucial clinical phenomena. The randomization would generally obviate the need to establish systems of taxonomy and criteria for identifying prognostic differences in cogent clinical phenomena that existed before treatment. Although crucial elements of clinical judgment in therapeutic decisions for individual patients, these phenomena were often deliberately excluded or ignored in the assembled hard data. When necessary in trials of analgesic or psychotropic agents, soft clinical phenomena also required no special scientific attention. They could be expressed with visual analog or ordinal rating scales that did not require standardized criteria because the double-blind ratings would presumably be unbiased, even if imprecise and individually unspecified.

    Because randomized trials were often regarded as the only scientific strategy for evaluating therapy, and because an unbiased comparison was regarded as the sine qua non of the statistical model, the absence of scientific clinical precision was not regarded as an important problem. When some of the trials produced controversial results, however, the collected information would require additional statistical analyses as “covariates” that might have affected the results, but the controversies would seldom be resolved because the crucial clinical distinctions had not been noted, classified, and included in the data available for the analyses.

    Thus, despite their magnificent general contributions, randomized trials have encouraged and allowed clinicians to evade the basic scientific challenges of appropriate data and clinical taxonomy.

    Bayes Theorem

    As new technologic tests became available, they had to be evaluated. The new information from these tests was used for diagnosis but also for many other clinical decisions such as estimating prognosis, choosing treatment, noting post-therapeutic changes, and offering important reassurance to patients and clinicians. Nevertheless, the evaluation process was usually aimed at diagnosis alone.

    The focus of the diverse clinical decisions was limited solely to diagnosis because a mathematical model was available for diagnostic evaluations but not for the other activities. The diagnostic work relied on two statistical indexes, sensitivity and specificity, that had originally been developed by J. Yerushalmy [14] for studies of accuracy when radiologists identified tuberculosis from radiographs of the chest. For Yerushalmy's studies, about half of the radiographs were chosen from patients with known tuberculosis and about half from those with normal or other conditions. The radiologists were thus working in a setting in which tuberculosis had an unrealistic 50% prevalence.

    Although suitable for radiologic research, this statistical tactic was not satisfactory for general evaluation of diagnostic tests. The most obvious problem was that the clinical accuracy of a “diagnostic marker” test was not properly expressed by sensitivity and specificity. A clinician needed to know the “batting average” of a test for its “predictive accuracy” in positive or negative results for patients with unknown diagnoses, not the sensitivity and specificity in patients whose diagnoses were already known. A second problem was that the sensitivity and specificity indices required that all results be divided dichotomously into high and low (or normal and abnormal) sectors. Choosing the boundary for this binary division was not easy for the many results, such as those for serum calcium and diverse enzyme tests, that are expressed in a dimensional continuum.

    To cope with these problems, two additional mathematical strategies were introduced. Bayes theorem was an algebraic expression that could convert indexes of sensitivity and specificity, together with an estimated value of disease prevalence, into the desired clinical expressions for diagnostic accuracy [15]. With the engineering mathematics of receiver operating characteristic (ROC) curves, the effects of different partitions could be examined to guide the choice of an optimum dichotomous boundary for dimensional data. These two strategies produced a plethora of papers and publicity for Bayes theorem and ROC curves. As the disadvantages of dichotomous partitions became apparent, however, a further mathematical strategy was invoked. The test results were divided into several ordinal sectors, rather than two binary zones, and a “likelihood ratio” was calculated for each sector [16]. Because the likelihood ratios were determined from values of odds, the results had to be converted into predictive probabilities to be intelligible as clinical batting averages. Consequently, special cards and nomograms were introduced to facilitate the transformation.

    During all the elaborate statistical activity, however, another fundamental problem was overlooked. It occurred because each diseased and each control group could have a diverse clinical spectrum of morphologic, symptomatic, comorbid, and other manifestations [17], and because the diagnostic accuracy of the marker test would vary in groups from different parts of the clinical spectrum [18, 19]. The indexes of sensitivity, specificity, prevalence, and likelihood ratios, however, were being calculated with the erroneous assumption that they were constant throughout all parts of the spectrum.

    Because of this fallacy, the Bayesian and likelihood tactics, although often interesting and intellectually exhilarating, seldom offered practical, direct clinical results. In recent years, another mathematical model, which uses multivariable statistical regression, has been applied to incorporate additional information into the diagnostic analyses. The multivariable mathematics, however, rely on forming a probabilistic score from a weighted linear combination of individual items. Pertinent results were still not demarcated for the conjoined, clustered attributes that form cogent groups in the clinical spectrum.

    Regardless of how the statistical problem might be solved for the clinical spectrum, however, the more profound problem still remained that all of the mathematical models were aimed merely at diagnosis, not at evaluating the total clinical usage of the technology in prognosis, therapy, and other decisions in patient care. Consequently, after two decades of iatromathematical approaches to diagnostic testing, a satisfactory set of methods has not yet been developed for evaluating informational technology. Central to the persisting problem is the classification of the clinical spectrum—a fundamental taxonomic challenge that remains generally neglected.

    Quantitative Decision Analysis

    Whereas Bayes theorem, ROC curves, likelihood ratios, and multivariable statistics were being applied in the diagnosis of disease, a different set of challenges came from decisions about patient management. After studying the way those decisions were made by excellent clinicians, reality-oriented investigators might have developed a model that was derived directly from the pragmatic clinical process. Instead, however, a new domain called “medical decision making” was created to apply a mathematical model, rooted in econometric theory, that had been developed in a business school [20]. The mathematical model had not received practical acceptance or widespread use in the realities of the business world, but the scholastic theory was clinically attractive.

    The theory begins by demanding the development of a flow chart or algorithm that shows all the possible options at each decision point and all of the possible outcomes for each option. This type of branching tree for possible options and possible outcomes has always been constructed by good clinicians, but the construction was often an informal, unspecified act of “judgment.” The demand formally identifying the component parts of the tree was therefore highly desirable.

    After the decision tree was constructed, however, the mathematical model called for a quantitative mechanism to find the “best” decision. For the quantitative process, each possible outcome was assigned a probability value for its estimated occurrence and a utility value for its desirability. The probability and utility values were then multiplied to form an arbitrary probability-utility score. After the scores were suitably arranged, the best decision was the one that would lead to the best score.

    Unfortunately, this tactic is wholly alien to clinical reasoning. Neither clinicians nor patients (nor businesspersons) make decisions using a quantitative score formed by multiplying probabilities by utilities. In the reasoning of good clinical judgment, the utilities and probabilities always receive careful and simultaneous consideration, but they are almost always arranged separately, without being manipulated into a single mathematical product. Aside from the unrealistic statistical score, however, quantitative decision analysis has two other major clinical-mathematical problems. The first is the difficulty of expressing utility values in a quantitative manner. Most people cannot readily convert joys and sorrows or gratifications and despairs into quantitative units. Although required for the mathematical formulas, the numbers assigned to these utilities cannot accurately portray the true “values” for individual persons.

    The second problem arises, like so many others, from the absence of suitable data and taxonomies for clinical phenomena. Without adequate documentary evidence, the probability estimates for various outcomes are inevitably imprecise, representing a guess about the “average” clinical events but lacking the specific details needed for the overt or subtle nuances of distinctive clinical subgroups.

    The current mathematical approaches to decision analysis have made two invaluable intellectual contributions to the current academic scene. First, clinicians have been forced to specify the algorithmic outlines for pathways of clinical reasoning. These specifications could be expressed and stipulated before the advent of the analytic mathematical models, but the publicity given to the decision trees has begun to make them a reasonably familiar presence on the modern clinical scene. Demarcating the “nodes” and pathways of clinical reasoning may turn out to be the most enduring contribution that decision analysis has made to scientific clinical taxonomy.

    A second useful contribution is that the analytic strategy can be a helpful mechanism when economists or administrators make inevitable decisions about the rationing and allocation of medical resources [21] for large groups of people in a demarcated geographic region. Because these decisions do not require specification for individual persons, they can be calculated with probabilities and utilities that are estimated for an “average” patient.

    For the care of individual patients, however, the mathematical model of decision analysis has the same flaws as all the preceding models. Without suitable taxonomy and basic data, and without better ways of expressing the preferences of individual patients, the model relies on inadequate quantitative expressions for the probabilities, utilities, and their products.

    Psychometric Strategies for Clinimetric Measurement

    As clinical investigators seeking hard data gave progressively less attention to the “humanistic” information of patients' symptoms and functional capacity, complaints began to mount about the technologic dehumanization of patient care. A simple solution to the problem would have been to insist on developing and using better clinimetric indexes [22] to measure such phenomena as relief of distress and improvement of functional status. The indexes could be constructed with two relatively simple clinimetric strategies that had been developed many years earlier. Virginia Apgar [23] had shown how ordinal ratings for carefully selected clinical variables could be added to form a score, and the American Joint Committee on Cancer [24] had shown how categories could be clustered to form the ordinal ranks of a staging system. Despite these readily available and immediately useful architectural “blueprints,” however, yet another mathematical model was applied: psychometric strategies that rely on statistical methods for measuring such complex phenomena as health status and quality of life.

    The scientific disparities between the clinimetric and psychometric methods have been discussed elsewhere [22, 25] and are too extensive for detailed discussion here. Regardless of the merits of psychometric statistics for appraising complex subjective phenomena, however, the approach has often brought confusion rather than clarification of the clinical challenges.

    One source of difficulty has been the uncertainty and controversy about what domains to include in assessing health status and quality of life. Should they include familial, social, economic, and spiritual or religious components, as well as the conventional medical ingredients? How should the diverse components be differentiated and inter-related when the patient's total status is affected more by changes in nonmedical than in medical phenomena? The most direct sources of confusion, however, have come from five basic problems in the psychometric approach.

    The first problem is that the psychometric instruments contain a large inventory of multiple “items.” The plethora of items, subscales, and aggregate scores may obscure (or may not even include) the particular symptom—such as dyspnea or joint pain—that is the focal concern to be relieved by treatment. A second problem is the unfamiliar methods used to cite the efficacy of the psychometric scores with statistical indices of “reliability” and “validity.” The methods can include factor analysis, Cronbach's α index, and the Kuder-Richardson index of “homogeneity,” as well as coefficients for diverse forms of criterion validity, construct validity, convergent validity, divergent validity, and so forth. Awed or baffled by the arcane vocabulary and statistics, clinical investigators may have neglected the appraisal of “face validity”—a statistically unmeasurable attribute that refers to the measurement's clinical “sensibility” or “common sense” in doing its intended job [22].

    A third problem is that an instrument receiving high statistical scores for “reliability” and “validity” in one clinical setting may be poorly suited for some other clinical setting. For example, measurements of activities of daily living often have excellent statistical credentials when applied to impaired patients in nursing homes but do not focus on the appropriate problems when used for ambulatory patients elsewhere.

    A fourth problem is that multi-dimensional indexes are seldom satisfactory for measuring change. This problem was pointed out many years ago by Nunally [26], a psychometric titan who corresponds to R.A. Fisher in the statistical pantheon, but the warning has often been ignored. The clinical-psychometric investigators have then been surprised or dismayed to find that the multiple-item indexes were not effective in discerning clinical response changes after therapy [27, 28]. Although clinicians can readily construct special “transition indexes” that identify changes easily and effectively [22, 29], the approach is not a standard part of psychometric strategy, and use of the transition method is often discouraged by consultants who cannot conveniently fit the results into a repeated-measures analysis of variance.

    Finally, although patients are the only persons who can suitably observe, evaluate, and rate their own quality of life and the important features of their own health status, individual patients have seldom been asked or allowed to indicate their own values and beliefs. The decisions about what is important and how to “weight” the components usually emerge either from mathematical calculations or from an assembled committee of pundits.

    Like all the other mathematical models, the psychometric strategies have been excellent for the particular goals at which they were originally aimed and intended. When applied to clinimetric goals, however, the strategies have often been unsatisfactory for hitting the intended target. A more attractive basic scientific approach would be for clinicians to stipulate the evidence and goals, let patients be the source of the most cogent information and appraisals, develop suitable clinical data and taxonomies, and construct appropriate, sensible clinimetric indexes.

    Additional Models

    Several other mathematical models have also become fashionably prominent on the current clinical scene. They include econometric approaches to cost–benefit and cost-effectiveness analysis and para-analytic procedures [30] in which medical-claims administrative data, accumulated for fiscal purposes, are used for other investigative goals in pharmacoepidemiologic, therapeutic, or outcomes research.

    These approaches have also had substantial appeal for funding agencies concerned with cost containment and for investigators wanting to do research on the outcomes of therapy. Because evidence of pertinent clinical distinctions may not be available for any of the analyses, however, the most important data for benefits, efficacy, therapeutic indications, and outcomes is usually missing from the data banks. The results thus often lack both the basic scientific requirements of reproducibility and accuracy, and the basic clinical requirements of applicability to individual patients or pertinent clinical subgroups.

    Conclusions

    Perhaps the most remarkable progress of the past 25 years has been the growth of investigative personnel concerned with clinical forms of clinical research. Despite the dominating hegemony of basic science and molecular biology in clinical investigation, patient-care research has become a reasonably respectable academic vocation. The researchers are now numerous enough to form viable new organizations and journals, capable enough to achieve academic advancement, and admired enough to be selected as leaders in academic departments or research societies.

    The investigators have also given considerable attention to clinical phenomena. Particularly in fields such as geriatrics and neurology, in which the main challenges have been to classify disability rather than disease, excellent new rating scales and taxonomies have been developed for such phenomena as coma, gait, balance, and delirium, and for screening functional status. For categorizing psychiatric ailments, psychiatrists have developed a multi-axial nosography that still needs improvements, but that represents an important effort to achieve a type of progress still needed in classifying “organic” disease. For challenges in ambulatory patients, primary care physicians have organized a taxonomy that is not fully successful but that offers a liberation from the morphologic constraints of the conventional nosographic catalog. Despite persistent problems in conceptualization and measurement, quality of life is an idea whose time has come; its appraisal is now demanded in many clinical trials from which it was formerly excluded. In addition, diverse efforts have been made to improve prognostication with various clinical staging systems for conditions such as diabetes mellitus, coronary disease, and cancer. The amount of work and progress has been relatively scant, however, in comparison with the deluge of activity based on mathematical models.

    None of the problems cited in this essay can be attributed to the mathematical models or to the people who originally proposed and promoted the models. The models are well constructed, well developed, and excellent for the goals at which they are aimed. The current problems exist because 1) the models have been applied to goals at which they are not aimed; 2) clinical leaders and readers have uncritically accepted the results; and 3) clinical investigators have not vigorously developed their own appropriate strategies, rooted in clinical realities and derived directly from those realities.

    Clinical judgment still has the paramount importance it has always had in patient care, but its basic scientific challenges in data and taxonomy have been generally overlooked during 25 years of emphasis on quantitative models derived from nonclinical sources. Clinical investigators of a new generation, seeking exciting intellectual stimulation and respecting the work of their own professional craft, can still find abundant opportunities for basic scientific investigation. The investigators may need substantial “ego strength” and may have to develop new forms of support for unfashionable research, but the basic scientific challenges are there to be pursued. The investigative clinicians can probe their own clinical knowledge and experience, intimately examine the events and relationships of patient care, identify the pertinent evidence, organize suitable taxonomic classifications, develop the clinimetric indexes, and do appropriate scientific analyses for the unique and fundamental characteristics of clinical activities that still occur as “clinical judgment”

    References

    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.
    6. 6.
    7. 7.
    8. 8.
    9. 9.
    10. 10.
    11. 11.
    12. 12.
    13. 13.
    14. 14.
    15. 15.
    16. 16.
    17. 17.
    18. 18.
    19. 19.
    20. 20.
    21. 21.
    22. 22.
    23. 23.
    24. 24.
    25. 25.
    26. 26.
    27. 27.
    28. 28.
    29. 29.
    30. 30.
    « Previous | Next Article »Table of Contents