Generic Health Measurement: Past Accomplishments and a Measurement Paradigm for the 21st Century

  1. Colleen A. McHorney, PhD
  1. From the University of Wisconsin-Madison Medical School and the William S. Middleton Memorial Veterans Hospital, Madison, Wisconsin. Note: This article is one of a series of articles comprising an Annals of Internal Medicine supplement entitled “Measuring Quality, Outcomes, and Cost of Care Using Large Databases: The Sixth Regenstrief Conference.” To see a complete list of the articles included in this supplement, please view its Table of Contents. Acknowledgments: The author thanks Fredric Wolinsky, PhD, and Earl Bricker, MA, for thoughtful comments on earlier versions of this manuscript; David Kindig, MD, PhD, and Mark Linzer, MD, for intellectual support; and Jody McIntyre and Amy Maloney for contributions as research staff. Grant Support: By the Department of Veterans Affairs (HSR&D HFP #96-001, RR&D C-2016, and IIR #95-033) and by the University of Wisconsin-Madison Medical School. Work on this paper was completed while Dr. McHorney was a 1996 Picker/Commonwealth Scholars Program Finalist. Requests for Reprints: Colleen A. McHorney, PhD, William S. Middleton Memorial Veterans Hospital, 2500 Overlook Terrace, Madison, WI 53705.

    Abstract

    Generic health surveys have been proposed for use in increasingly diverse applications and populations. This paper describes the history of generic tools in the past 30 years and suggests a more modern measurement platform for advances in the 21st century. Many generic tools lack the precision required for effective health care decision making. A meaningful goal for the next era of development of generic measures should be the generation of equiprecise measurement for generic health concepts. Equiprecise tests yield measures of equal precision at all levels of the underlying construct. Equiprecise measurement can be achieved through conjoint use of computerized-adaptive testing as the survey platform and item response theory as the measurement theory.

    Over the past 30 years, researchers have generated numerous tools that use self-reporting to measure functional status, emotional well-being, and subjective perceptions of health [1-22]. Uses of generic surveys have increased dramatically in recent years as a result of the outcomes movement. Some authors have advocated the routine inclusion of data on generic health status in large databases [23], but others, such as Liang and Shadick [24], caution that the utility of doing so remains largely unproven. The purpose of this paper is to describe the history of generic health measurement and to suggest a more modern measurement paradigm for the 21st century. The conjoint use of computerized adaptive testing and item response theory offers distinct advantages for health outcomes assessment that could improve the feasibility and utility of including patient-centered data in large administrative databases.

    The Evolution of Generic Health Measurement

    Figure 1 presents a timeline of the evolution of generic health measures with respect to broader developments in health policy and health status assessment. Roughly coincidental with the publication of the World Health Organization's definition of health [25] was the emergence of clinically based, global rating scales whose content extended beyond organ function to encompass human function. Such measures as the Karnofsky performance status scale [26] and the function scale of the American Rheumatoid Association [27] were intended to supplement physiologic measures in an attempt to better understand treatment effectiveness. Around the same time, efforts to modernize national health indicators, including the incorporation of single-item indicators of activity limitations and perceived health in the National Health Interview Survey [28], were initiated [29].

    Figure 1. ARA = American Rheumatoid Association Functional Class; COOP = Dartmouth COOP Poster Charts; Duke = Duke-UNC Health Profile; Duke-17 = Duke Health Profile; FSQ = Functional Status Questionnaire; HIE = Health Insurance Experiment; HPL = Human Population Laboratory; HPQ = Health Perceptions Questionnaire; HSI = Health Status Index; KPS = Karnofsky Performance Status; Katz = Katz Index of Activities of Daily Living; LF-149 = Medical Outcomes Study 149-Item Functioning and Well-Being Profile; M-M = morbidity and mortality; MHIQ = McMaster Health Index Questionnaire; NHIS = National Health Interview Survey; NHP = Nottingham Health Profile; PGWB = Psychological General Well-Being Scale; QWB = Quality of Well-Being Scale; SF-6 = Medical Outcomes Study 6-Item Health Survey; SF-12 = Medical Outcomes Study 12-Item Health Survey; SF-20 = Medical Outcomes Study 20-Item Health Survey; SF-36 = Medical Outcomes Study 36-Item Health Survey; SIP = Sickness Impact Profile; WHO = World Health Organization.
    View larger version:
    Figure 1. ARA = American Rheumatoid Association Functional Class; COOP = Dartmouth COOP Poster Charts; Duke = Duke-UNC Health Profile; Duke-17 = Duke Health Profile; FSQ = Functional Status Questionnaire; HIE = Health Insurance Experiment; HPL = Human Population Laboratory; HPQ = Health Perceptions Questionnaire; HSI = Health Status Index; KPS = Karnofsky Performance Status; Katz = Katz Index of Activities of Daily Living; LF-149 = Medical Outcomes Study 149-Item Functioning and Well-Being Profile; M-M = morbidity and mortality; MHIQ = McMaster Health Index Questionnaire; NHIS = National Health Interview Survey; NHP = Nottingham Health Profile; PGWB = Psychological General Well-Being Scale; QWB = Quality of Well-Being Scale; SF-6 = Medical Outcomes Study 6-Item Health Survey; SF-12 = Medical Outcomes Study 12-Item Health Survey; SF-20 = Medical Outcomes Study 20-Item Health Survey; SF-36 = Medical Outcomes Study 36-Item Health Survey; SIP = Sickness Impact Profile; WHO = World Health Organization. Timeline of the evolution of generic health measures with respect to broader developments in health policy and health status assessment.

    The policy initiatives of the “War on Poverty” in the mid-1960s prompted two advances in health measurement. First, the social indicators movement ushered in measurement of quality of life in general populations [30, 31] and provided indicators of how well we lived, which were to be used with existing measures of how much we produced and spent [32]. Second, unified indexes of mortality and morbidity were developed for planning and evaluation purposes at the population health level [33-35].

    A watershed for generic health assessment can be traced to the Human Population Laboratory, which launched measurement work in physical, mental, and social health [1-4]. As important, the Human Population Laboratory demonstrated that respondents will complete long surveys by mail [36], a finding that reduced the bias against mail surveys.

    In the 1970s, the development of generic tools proliferated, in part as a result of extramural support from the National Center for Health Services Research. Definitional expansiveness was the signature of this era, and multi-item scales replaced single-item measures. The Quality of Well-Being Scale, developed for priority setting and program evaluation, represented a meaningful advance by measuring the value components of a social indicator of health [6, 37]. Next, the Sickness Impact Profile [7, 38] was developed for health care evaluation. The 136 items in this profile were obtained from patients, providers, and caregivers and yielded individual health profiles and summary scores. The McMaster Health Index Questionnaire [8, 39] followed. Intended for use in clinical and health services research, it measured physical, social, and mental health by using 59 items. The Health Perceptions Questionnaire [40] was constructed for use in health planning and evaluation and tapped the elusive realm of “positive health.”

    In 1979, health status measures for the adult general population emerged from the Health Insurance Experiment [9]. Next, the Nottingham Health Profile [10, 41] was developed for use in population surveys, clinical trials, and clinical practice. The 38 items in the Nottingham Health Profile tapped six health concepts and were derived from patients. The Duke Health Profile [11] was developed for use in research and clinical applications in primary care. The 63 items in this profile covered four health concepts and were obtained from the literature.

    In the early 1980s, development of new measures took a respite but health research increasingly applied existing measures [42-44]. Interest in methodologic issues increased [45-48]. By the mid-1980s, interest had developed in the use of generic tools in everyday clinical practice, largely because of research showing poor correspondence between clinician and patient ratings of function and well-being [49-51]. In addition, growing recognition of the biopsychosocial model [52] and its relevance to an aging population resulted in increased appreciation that the preservation of function and well-being is an important goal of medical care [53]. Clinical practice applications ushered in the era of practicality. Shorter tools were developed: The Functional Status Questionnaire consisted of 34 items [13], and the Dartmouth COOP Charts had 9 items [14]. These tools were developed with measurement priorities directed toward practical efficiency (for example, ease of administration and scoring), which was achieved at the expense of measurement precision [54, 55].

    The most recent era of health measurement is that of psychometric efficiency, which has several underpinnings. First, the outcomes movement gained momentum after Ellwood's Shattuck lecture was published [23] and the Agency for Health Care Policy and Research was established in 1989. Large-scale studies of patient-based outcomes were imminent. Second, burdened by study costs that spanned outcomes ranging from pathophysiology to quality of life, the clinical trials community sought more economical measures of health status. Third, concerns about respondent burden among severely ill patients encouraged shorter surveys.

    The Medical Outcomes Study (MOS) Short Form (SF) 20 Survey [16, 56] was the first to surface. The 20 items derived largely from the Health Insurance Experiment and tapped six health concepts. Next emerged the Duke Health Profile [17], a 17-item survey that was empirically derived from the original Duke Health Profile. The SF-36 [21] developed out of the SF-20 and the 149-Item Functioning and Well-Being Profile, which measures 16 health concepts [19]. The SF-6 Survey, derived from the Functioning and Well-Being Profile, uses a single item to tap 6 health concepts [19]. The SF-12 Survey is an empirically derived short form of the SF-36 [22].

    Over the past 30 years, we have greatly improved our measurement bandwidth in generic health assessment (the breadth of health dimensions measured). Many different health concepts are now measured across the armamentaria of generic tools, although specific surveys differ in bandwidth (for example, the Sickness Impact Profile measures 12 health concepts, whereas the McMaster Health Index Questionnaire measures just 3). However, many generic measures, even those with excellent bandwidth, still have problems of fidelity (that is, thoroughness and depth of measurement). Thus, although we now quantify many different dimensions of health, we often do so at the expense of precision. Overall, many generic tools lack the precision required for effective health care decision making. Precision is conceptualized here and elsewhere [57] as a property of a measure that encompasses both the range or depth of measurement and the number of distinct levels enumerated by a scale (fineness of specification).

    Prevailing Measurement Paradigm

    Generic health status tools have been developed in the group-testing tradition. The defining signature of group tests is the use of a fixed set of questions (items) for all respondents, regardless of the appropriateness of any specific item for a given individual respondent. Items in group tests are selected or written to represent a moderate range of activities at a moderate level of difficulty. The era of psychometric efficiency emphasized construction of generic measures containing as few items as possible. Acceptable standards of face and content validity and reliability with few items can best be achieved by selecting items that are fairly homogeneous. Thus, selected items are often in the middle range of item difficulty and are almost alternate forms of each other. These measurement standards have two consequences, both of which are evident in generic measures.

    First, fixed-length health surveys tend to bore healthier respondents (because they have to wade through items that are easy for them to do, such as bathing) and frustrate more impaired respondents (because they have to respond to items that are clearly impossible for them to do, such as running one block). Such complaints about generic surveys are common from respondents. Respondents do not object to survey length itself; rather, they are frustrated by redundant items and items that to them are of low salience and relevance [58-60].

    Second, because item selection is geared toward the middle-of-the-road in content coverage and difficulty, the end points of the health continuum tend to be poorly defined. This yields ceiling effects for general populations and floor effects for more disabled populations. For many generic measures [54, 55, 61], score distributions are often highly skewed, such that a plurality of respondents are classified as being in a state of “perfect” health at or near to the ceiling of the scale. Very large ceiling effects (up to 70%) have been observed in general and primary care populations [41, 62-64]. Ceiling effects are more prevalent than floor effects because many generic tools represent health as the absence of limitations.

    Score imprecision has two principal consequences. First, it is impossible to distinguish among persons at the ceiling or floor, even though they probably vary in the underlying construct. For example, as shown in Figure 2, 55% of patients participating in the MOS scored perfectly on the SF-6 measure of physical functioning [54], but 69% of those patients had less than a perfect score on the SF-36 physical functioning scale, a longer parallel-form measure. For health care managers and policy-makers, ceiling effects paint a more favorable image of population health than is true. For researchers, ceiling effects produce type II errors in hypothesis testing. For clinicians, ceiling effects yield false-negative outcomes (that is, sensitivity at the upper end of the scale is low). The second consequence is that it is impossible to measure decline in health over time for persons at the floor and improvement in health over time for persons at the ceiling. Thus, score distributions that are skewed at baseline underestimate or miss the effects of treatment or natural history on health status.

    Figure 2. SF-6 = Medical Outcomes Study 6-Item Health Survey; SF-36 = Medical Outcomes Study 36-Item Health Survey. Data obtained from the Medical Outcomes Study .
    View larger version:
    Figure 2. SF-6 = Medical Outcomes Study 6-Item Health Survey; SF-36 = Medical Outcomes Study 36-Item Health Survey. Data obtained from the Medical Outcomes Study . Physical functioning scores for chronically ill patients.[54]

    Paradigm To Achieve Precision across Populations and Applications

    Future instrument development should attend to the precision of score distributions in the various populations in which the tools are applied. A meaningful goal for the next era of measurement is to generate equiprecise measurement [65] of generic health concepts. Equiprecise tests yield measures of equal precision at all levels of the underlying concept (for example, equal precision for the sickest and the most healthy). The bandwidth-fidelity dilemma is solved with equiprecise measures-the desired paradigm of wide bandwidth combined with high fidelity is within reach [65]. Equiprecise measurement can be achieved through conjoint use of the testing method known as computerized adaptive testing and the measurement theory known as item response theory [65, 66].

    A logical progression of generic health assessment, especially if routine inclusion of these data in administrative databases is desirable, is to move from paper-and-pencil surveys to computerized adaptive testing of health status. This method uses a computer (or a computerized telephone interview) to administer items to respondents and is adaptive in a literal sense because each “test” is tailored to the unique ability level of each respondent. Each person taking a computerized adaptive test is taking a different version of the test because items are administered on the basis of the respondent's previous answers (for example, whether one passes or fails items). As discussed below, item response theory allows all of the different forms of a test to connect to each other on the same metric or yardstick.

    Computerized adaptive testing is becoming commonplace in knowledge-based testing. For example, academic admissions examinations, such as the Graduate Management Admission Test, are increasingly based on computerized adaptive testing. National nursing licensing examinations now use this method [67, 68], and other medical boards are moving in that direction [69-72]. The advantages of computerized adaptive testing for large-scale testing, whether it be for credentialing or health status assessment, are numerous and include improved test security, increased flexibility of test scheduling, reduction of testing time by one quarter to one half, and rapid scoring and feedback of results [68, 72, 73].

    How would computerized adaptive testing work for generic health assessment? Once important preconditions are met (outlined below), the following computerized algorithms would be developed: starting rules; item selection procedures; answering rules, such as time limits; scoring rules; stopping rules; and reporting generation. The following example illustrates the use of computerized adaptive testing. A 63-year-old woman comes to a clinic for an annual examination. She is seated at a station with a large screen. The only skills she requires are knowing how to use the space bar, enter key, and tab key, each of which is large and is color-coded. She is prompted for her age and sex, which are used to select the starting point for her “physical functioning computerized adaptive test.” Her starting question asks about walking the length of a city block. If she passes this item (that is, she can accomplish the task), the computer bypasses all of the easier items (such as walking across the room) and moves to more difficult questions (such as walking 1 mile). As the test proceeds, she is given questions that narrow in on her zone of physical functioning.

    With computerized adaptive testing, the degree of precision obtained for a person depends on the uses to which the data will be put. For example, a clinician may want high precision when the consequences of assessment include nursing home placement. The value of computerized adaptive testing is that precision is determined by the user, not by the test or by the population. Use of computerized adaptive testing for generic health assessment could reduce the human capital involved in administering health questionnaires, challenge respondents at their ability level instead of boring or discouraging them, provide users with the exact amount of precision desired for each application, and provide “real-time” scores to users. This new era of measurement is probably 3 to 8 years away, but it is possible.

    Development of a computerized adaptive testing strategy would require three phases of methodologic work. The first task would be to assemble item banks on different generic health concepts. An item bank consists of many questionnaire items that are matched to a given concept or task [74, 75]. Items would be assembled from existing measures. The language and structure of some items would have to be modernized, and the reading level of other items would need to be decreased. An appropriate rating scale would need to be selected to maximize reliable variance and minimize respondent burden and response invalidity.

    The second task would be to conduct cognitive interviews with various patient groups to obtain in-depth information on the respondents' understanding and acceptance of the revised items. Previous research that used cognitive interviews showed numerous problems with the extent to which respondents understand health questions and the manner in which they answer them; these problems can compromise the validity of the obtained data [76, 77]. These interviews could also obtain input from respondents on gaps in item content.

    The third methodologic task would be to use techniques subsumed under item response theory to evaluate the caliber of each banked item and its location (that is, relative difficulty) on the underlying trait (for example, physical functioning) and to build efficient tests from the newly evaluated item banks. Item response theory is both a theoretical framework and a collection of quantitative techniques used to construct tests, scale responses, and equate scores, as well as to identify item bias and facilitate computerized adaptive testing. Item response theory is increasingly embraced as an alternative to classic test theory, the theory under which many generic and disease-specific tools have been evaluated [74, 78].

    There are several important differences between item response theory and classic test theory. First, quantitative indexes of psychometric performance (such as validity or reliability) derived under classic test theory are not intrinsically generalizable across populations (younger compared with older persons or sick compared with healthy persons), applications (cross-sectional compared with longitudinal studies), or testing situations (mail compared with telephone administration or completion at home compared with in the clinic), but most users naively assume that they are. This lack of generalizability undermines the integration of generic health data in large databases that span different patient groups and different testing situations. Second, as its name implies, classic test theory is test driven rather than item driven. Different tests cannot be placed along a common metric. Each test is its own separate yard-stick-each occupies different planes of a space rather than different spots on a common, underlying continuum. Thus, a large database that contains the Sickness Impact Profile cannot be compared with one that contains the SF-20.

    Because it is item focused, item response theory goes beyond the weak assumptions and the “boundedness” classic test theory. The unit of analysis in item response theory is the item and, more specifically, its location on the underlying, continuous trait of interest (the latent trait). An important feature of item response theory is that it synergistically analyzes respondent ability and item difficulty and mathematically places them together on the same metric (unlike other measurement theories, which divorce ability from item difficulty). The item response theory model provides the empirical link between individual responses (observable performance) and the latent trait (unmeasured construct) and estimates a score for the respondent on the underlying trait and a difficulty estimate for the item on the underlying trait. Various mathematical models (for example, one-, two-, and three-parameter models and binary and polytomous response models), each with different assumptions, can be used to estimate the association between person ability and item difficulty.

    The strengths of item response theory for advances in generic health assessment are twofold. First, the theory is a powerful technique for understanding the structure, order, and interrelations of items. For computerized adaptive testing, extensive item response theory modeling would be conducted on the item banks. Each item bank would have to be large enough to accommodate the fact that some items would perform poorly in terms of empirical tests and would have to be discarded or revised. Modeling according to item response theory would identify gaps in item difficulty and content coverage on the underlying trait. Items could then be developed to fill in these gaps; this process would reduce skewed score distributions and make headway toward equiprecise measurement. Second, empirical parameters of respondent ability and item performance would then be used to develop efficient computerized adaptive testing algorithms and to equate different forms of a given computerized adaptive test with each other. It is at this point that widespread computerized adaptive testing becomes a reality and that different users of the same item bank can speak a shared language through test equating.

    Item response theory methods have been used in rehabilitation medicine [79-81], mental health [82, 83], and disease-specific instruments [84-89]. These methods have been used more to validate tests [90-94] than to construct instruments or score scales. If item response theory and computerized adaptive testing seem so promising, why haven't they seen greater application in generic health measurement to date? First, item response theory requires measurement of unidimensional concepts; not all generic concepts meet this criterion. For example, multifactorial mental health scales that tap anxiety, depression, and psychological well-being would not satisfy unidimensionality criteria. Second, item response theory rests on an ordered continuum of items, which differs greatly from many generic scales constructed in the Likert tradition. Third, fitting of an item response theory model, which is an essential a priori task for computerized adaptive testing, requires iterative modeling and sensitivity tests. It is a very time-consuming method that requires highly specialized skills. Fourth, item response theory is a large-sample method, requiring hundreds of participants for modeling. Fifth, computerized adaptive testing requires large item banks, the construction of which is a substantial methodologic task. Finally, computerized adaptive testing would yield truly generic surveys. Scientists and users alike would need to support “no-name” tools, which may run counter to today's proprietary and trademarked instruments.

    The most efficient way to proceed from paper-and-pencil surveys to a 21st century computerized adaptive testing paradigm would be to establish publicly supported measurement centers for different generic health concepts. Each center would be responsible for accomplishing the three broad agendas outlined above on a given health concept and for helping investigators to use generic adaptive tools and interpret the results. These centers would be clearinghouses for measurement, validation, calibration, and interpretation. With the support of the Department of Veterans Affairs, I am developing an item bank for physical health concepts. This project is a preliminary feasibility assessment of the use of item response theory models (and subsequently computerized adaptive testing) for measuring generic health status.

    What To Do in the Meantime

    Group-Level Applications

    For now, investigators will continue to grapple with the bandwidth-fidelity dilemma. The challenge is to match the focus of the study with generic concepts that are known or have been hypothesized to covary with it. Selection of generic concepts should be hypothesis driven [95]. Health concepts that have the greatest clinical, policy, and social bearing should be measured with the greatest precision possible. Because the fixed costs of data collection are high and survey response rates are inelastic (insensitive) to survey length [96, 97], it may be advantageous, when in doubt, to opt for more precise measurement. It will also be important to ascertain where a sample is likely to fall on the score distribution. Different generic measures vary in overall precision and in the location of their precision [57]. The power of hypothesis testing is enhanced when a suitable correspondence exists between a measure's precision position and where a given sample will be distributed at baseline and follow-up. As suggested above, the new era of computerized adaptive testing will facilitate the correspondence between respondent ability and item difficulty.

    Individual-Patient Applications

    Provision of reports on functional status and well-being to clinicians has not led to changes in practice style and has not improved patients' health outcomes [98-101]. These disappointing findings may be the consequence of using group-level tools for individual-patient assessment [55]. Group-level measures yield imprecise (as seen in large confidence intervals) and insensitive (as reflected by false-negative outcomes associated with large ceiling effects) scores for individual patients [55]. A major disadvantage of using group-level tools with individual patients is that the scores obtained are not easily interpretable: Scores between the lowest and highest possible values can be achieved by countless combinations of item responses. Item response theory, on the other hand, yields scores that can be more easily interpreted in terms of cause, and departures from the expected order can be determined. Item response theory also yields reliability estimates at the level of the individual person, which facilitates the assessment of longitudinal change in health.

    The Role of Item Response Theory-Based Computerized Adaptive Testing in Measuring Outcomes by Using Large Databases

    Ellwood's 1988 vision of outcomes management [23] was not unlike that of Florence Nightingale in the 1860s or that of Earnest Codman in the 1910s. However, Ellwood “upped the ante” by calling for large-scale collection of outcomes data that included behavioral and emotional function. A problem of the outcomes movement has been the implicit assumption that “one size fits all” -that a given tool can equally meet the diverse needs of practicing clinicians, health care managers, purchasers, payers, policymakers, regulators, and researchers.

    Computerized adaptive testing-based measurement (generic and disease-specific) could yield effective information for health care decision making at many levels. Users would choose the precision desired for any given application, instead of being held hostage by the fixed parameters of existing instruments. Item response theory is the theoretical and mathematical glue that allows tests of different precision to be compared with each other. Successful item response theory modeling transforms scales that differ in precision into a common currency of ability (test equating). For example, a single-item measure of depression used in an employer survey could be equated to a somewhat longer measure used in a clinical trial, which in turn could be equated to an even longer measure used diagnostically in clinical practice. Item response theory-based computerized adaptive testing would enable the equating of scores into a common yardstick across tests, individuals, or time. The development of a shared language that goes beyond specific items to location on an ability scale would provide users tremendous flexibility in building and maintaining an outcomes capacity within and across different databases.

    Conclusion

    Twenty-one years ago, Thomas Bice [102] wrote that “progress in measurement is marked by increasing ability to discriminate among quantities and qualities of phenomena, not by devising quantitative summaries that obscure differences.” The field of health status assessment has made great progress in measuring both the quality and quantity of generic health states. The validity of generic measurement and its armamentarium of tools is expressed not only by the existence of numeric indexes but also by the extent to which those tools yield useful information for decision making. Future work should be designed to achieve equiprecise measurement of health concepts, improve the capacity to calibrate and interpret generic health data, and use existing and emergent computer technology to measure generic health states more efficiently and effectively.

    References

    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.
    6. 6.
    7. 7.
    8. 8.
    9. 9.
    10. 10.
    11. 11.
    12. 12.
    13. 13.
    14. 14.
    15. 15.
    16. 16.
    17. 17.
    18. 18.
    19. 19.
    20. 20.
    21. 21.
    22. 22.
    23. 23.
    24. 24.
    25. 25.
    26. 26.
    27. 27.
    28. 28.
    29. 29.
    30. 30.
    31. 31.
    32. 32.
    33. 33.
    34. 34.
    35. 35.
    36. 36.
    37. 37.
    38. 38.
    39. 39.
    40. 40.
    41. 41.
    42. 42.
    43. 43.
    44. 44.
    45. 45.
    46. 46.
    47. 47.
    48. 48.
    49. 49.
    50. 50.
    51. 51.
    52. 52.
    53. 53.
    54. 54.
    55. 55.
    56. 56.
    57. 57.
    58. 58.
    59. 59.
    60. 60.
    61. 61.
    62. 62.
    63. 63.
    64. 64.
    65. 65.
    66. 66.
    67. 67.
    68. 68.
    69. 69.
    70. 70.
    71. 71.
    72. 72.
    73. 73.
    74. 74.
    75. 75.
    76. 76.
    77. 77.
    78. 78.
    79. 79.
    80. 80.
    81. 81.
    82. 82.
    83. 83.
    84. 84.
    85. 85.
    86. 86.
    87. 87.
    88. 88.
    89. 89.
    90. 90.
    91. 91.
    92. 92.
    93. 93.
    94. 94.
    95. 95.
    96. 96.
    97. 97.
    98. 98.
    99. 99.
    100. 100.
    101. 101.
    102. 102.
    « Previous | Next Article »Table of Contents