The In-Training Examination in Internal Medicine
- Richard A. Garibaldi, MD;
- Marie C. Trontell, MD;
- Herbert Waxman, MD;
- John H. Holbrook, MD;
- D. Theresa Kanya, MBA;
- Shahram Khoshbin, MD;
- John Thompson, MD;
- Maribeth Casey, MA;
- Raja G. Subhiyah, PhD; and
- Frank Davidoff, MD
- From the University of Connecticut Health Center, Farmington, Connecticut; the Robert Wood Johnson Medical Center, New Brunswick, New Jersey; the Albert Einstein Medical Center, the American College of Physicians, and the National Board of Medical Examiners, Philadelphia, Pennsylvania; the University of Utah, Salt Lake City, Utah; Brigham and Women's Hospital, Boston, Massachusetts; the University of Kentucky School of Medicine, Lexington, Kentucky. Requests for Reprints: Richard A. Garibaldi, MD, Department of Medicine, Room LG-004, University of Connecticut Health Center, Farmington, CT 06030-3950. Acknowledgments: The authors thank Jill Antunes, BA, and Kathleen L. Egan, PhD, for technical assistance and Ms. Mary Ann Argiro for secretarial support.
Abstract
Objective: The In-Training Examination in Internal Medicine (ITE-IM) has been offered to internal medicine trainees annually since 1988 as an instrument for self-assessment. This report outlines the manner in which the test is prepared, reviews the results of annual examinations, and analyzes trends during the past 6 years.
Design: Results of each examination were reviewed with regard to the demographic characteristics of persons taking the test, their previous medical training, and their present program affiliations.
Results: The number of residents participating in the ITE-IM has increased steadily over the past 6 years. In 1993, more than 12 000 residents from more than 90% of internal medicine training programs in the United States participated in the examination; the percentage of international medical school graduates taking the examination increased from 27% in 1988 to 47% in 1993. Statistical analyses of each examination have shown it to be reliable, internally consistent, and discriminating. Over the past 6 years, graduates of U.S. medical schools have scored consistently higher than those of international medical schools and schools of osteopathic medicine on all annual examinations. However, in 1993, for residents at all levels of training, the differences in scores between graduates of U.S. medical schools and graduates of international medical schools narrowed substantially. From 1988 to 1993, there has been a trend toward lower scores by every cohort on each subsequent examination. The decreases in scores are most pronounced for graduates of U.S. medical school and those of schools of osteopathic medicine. The lower scores may be caused by either an increased level of difficulty in the examination or decreased knowledge among examinees.
Conclusions: The ITE-IM is a useful instrument to assess the knowledge base of residents during internal medicine training. It provides residents and program directors with a reliable evaluation of themselves and their programs in comparison to their national peer groups. It also provides objective data to monitor trends over time in residents' scores and relates them to the changing demographic characteristics of trainees and to innovations in the clinical curricula of internal medicine training programs.
The In-Training Examination in Internal Medicine (ITE-IM) is a written examination that was developed to enable medical housestaff and residency programs to compare themselves with their national peer groups. Since 1988, the examination has been administered annually to internal medicine residents on a voluntary basis. More than 90% of medicine training programs in the United States have participated each year. We review the first 6 years of the ITE-IM, including its preparation, content, administration, and results. We also provide insights into the changing profile of internal medicine residents and training programs from 1988 to 1993.
The Examination
Background
In the 1970s and 1980s, several medical professional organizations, including the American College of Surgeons, the American Board of Family Practice, the American Board of Anesthesiology and the American Society of Anesthesiologists; subspecialty surgical societies; and a coalition of internal medicine residency programs in the Midwest and Northeast, developed written examinations to test the knowledge base of their residents-in-training [1-9]. These initiatives provided momentum to develop a standardized, national in-training examination for internal medicine residents. In the mid-1980s, the American College of Physicians, the Association of Program Directors in Internal Medicine, and the Association of Professors of Medicine collaborated to prepare and administer an annual voluntary, written examination for internal medicine residency programs in the United States. The ITE-IM was targeted to residents in their second year of postgraduate training, the midpoint of their clinical training, although residents in their first and third years of postgraduate training were also encouraged to take the examination. The examination was intended to be a primarily educational instrument for self-evaluation [10]. It was specifically noted that the examination was not to be used as a pretest for determining eligibility for certifying examinations, as a substitute for clinical competency examinations, for promotion or termination within residency programs, or by any outside regulatory agency to assess the knowledge of a particular resident or the quality of a particular training program.
Preparation
Each edition of the ITE-IM is prepared by a committee composed of 10 authors representing the American College of Physicians, the Association of Program Directors in Internal Medicine, and the Association of Professors of Medicine. Questions are written by committee members according to a test blueprint that defines the major content of the examination and the proportion of questions that will be included for each organ system and related disciplines. To ensure reliability, between 25% and 50% of questions in each examination are items that were used in previous examinations. A pilot examination containing only new questions is critiqued by a group of chief medical residents and program directors before each examination.
Content
The ITE-IM is intended to test the knowledge base of residents in their second year of postgraduate training in general medicine. The subject matter of questions is selected to reflect the experience of residents at this level of training. The test also includes items to assess the resident's understanding of the clinical examination and practical applications of basic medical concepts. In recent examinations, questions that relate to ambulatory care and case vignettes that require clinical decision making have been emphasized. Each question is characterized by its setting (ambulatory, inpatient, or critical care), content (physical examination skills, diagnosis, or treatment), and process (judgment, synthesis, or recall).
Each year, the examination consists of between 375 and 450 items distributed among the organ systems and related areas according to the blueprint of that year. The entire examination consists of test items that are relevant to the practice of internal medicine. Subjects for questions include topics in primary care, critical care, neurology, psychiatry, dermatology, geriatrics, preventive medicine, medical ethics, and epidemiology, as well as the medical specialties. Examination scores are grouped by organ system to facilitate analysis and provide targeted feedback to residents and program directors. Approximately 20% of questions test core knowledge in internal medicine and related disciplines; the percentage of questions allotted to each organ system ranges from 7% to 15%, with most between 9% and 10%.
Administration
The ITE-IM is administered each January at internal medicine training program sites in the United States, Puerto Rico, and Canada. All participating programs offer the examination to residents in their second year of postgraduate training; many programs also make it available to residents in their first and third years. Residents are not encouraged to study for the examination. If the examination is not given to all housestaff at a site on a single day, the residents in their second year of training are scheduled on the first day to maintain the security of the testing material for this cohort. Program directors are responsible for the administration and security of the examination at each site.
Scores
Individual examinations are scored and analyzed statistically by the National Board of Medical Examiners. Before final scoring, test items that have poor statistical performance are reviewed by staff from the Board and the American College of Physicians, as well as the committee chairperson. In general, items that fewer than 30% of residents answered correctly or items that failed to discriminate high-scoring from low-scoring examinees are identified for review; if found defective in content or structure, these questions are excluded from the final scoring and analysis.
The results of the January examination are mailed to program directors for distribution to residents in late March or early April. Each resident receives a report that includes his or her total score of the percentage of correct responses and the distribution of scores for the nationwide cohort of persons taking the test from that peer group. Each resident also receives a list of question numbers that were answered incorrectly and their educational objectives; this information is organized by organ system and enables the resident to identify specific topic areas that need review.
Program directors receive copies of the individual reports that are provided to their residents, including the list of items that were answered incorrectly and the educational objectives. The program directors also receive a composite of the average scores of the residents in their program by year of training, accompanied by scores from the national cohorts of residents taking the examination in each level of postgraduate training. The average scores of each group in the program are further subdivided by organ system area for comparison with scores of the national peer group.
Statistical Analysis
Analysis of resident performance on the ITE-IM from each year has shown the total test scores to be reliable. The Kuder-Richardson reliability coefficient is used to measure the internal consistency of the test and to provide an estimate of the accuracy of scores [11]. The reliability coefficient is a statistic that can range from 0.0 to 1.0. A high value indicates that the scores of individual questions correlate highly with one another; a low value indicates that questions are heterogeneous or nonconsistent. The reliability of the examination increases with the total number of test questions, as well as with the homogeneity and quality of individual test items. The reliability of the test is also increased by the inclusion of questions that have done well in previous years. Examinations with reliability coefficients of 0.70 and higher are considered precise enough to provide reliable educational feedback to persons taking the test. The reliability coefficient of the examination has been consistently greater than 0.91 for residents in their second year of postgraduate training each year; in 1992 and 1993, it was 0.95. This means that 95% of the variance in scores among test-takers is caused by true differences in proficiency. The consistently high reliability coefficient for the examination is similar to that of examinations used for the purposes of licensure or certification, including the examinations of the National Board of Medical Examiners or the United States Medical Licensing Examiners and the certifying examination of the American Board of Internal Medicine.
The reliability coefficients for the individual organ system areas or medical specialty scores are high, although not as high as that for the total examination score. Lower reliability coefficients can be expected because of the smaller number of test items in each of the component areas. Reliability coefficients for organ system scores and the total score for the 1993 examination are provided in Table 1, which shows how the reliability increases as the number of test items increases. Scores of the average percentage of correct responses and average item discrimination indices are also listed in Table 1. The discrimination index, or point-biserial correlation coefficient, is an indication of how well, on average, individual questions distinguish examinees with high total scores from those with low total scores [12]. Values greater than 0.10 are considered acceptable by the In-Training Examination Committee. All subtests in the 1993 examination have average values between 0.20 and 0.25.
Results
Program Participation
The number of housestaff at each level of training who have taken the ITE-IM during the last 6 years has increased steadily (Table 2). In 1993, more than 11 000 of the approximately 18 000 residents in the United States participated, representing more than 400 of 420 training programs. The examination is taken by more than 90% of all internal medicine housestaff in their second year of postgraduate training and more than 65% of all internal medicine trainees at all postgraduate levels.
During the past 6 years, there has been a marked increase in the total annual number and percentage of residents taking the ITE-IM who have graduated from international medical schools not accredited by the Liaison Committee on Medical Education (Table 3). In 1988, 27% of examinees were graduates of international medical schools, compared with 47% in 1993. During this period, the number of U.S. and Canadian medical school graduates taking the examination each year increased slightly, whereas their percentage of the total pool of examinees decreased from 64% to 49%. Graduates of U.S. schools of osteopathic medicine have consistently accounted for 4% to 5% of examinees since 1988.
Test Results
The average scores of examinees at each level of residency training are presented in Table 4; the scores for residents in their second year of postgraduate training are shown in Figure 1. Although the examination has not been equated from year to year, there appears to be a recent trend toward lower total scores for all cohorts. This might reflect either an increased level of difficulty in the examination or a decreased level of knowledge among examinees. The comparisons among cohorts of graduates of U.S. medical schools, international medical schools, and schools of osteopathic medicine for each examination, however, show interesting trends that are evident for each level of residency training.
First, the average scores of graduates of U.S. medical schools are consistently higher at every level of training than those of graduates of international medical schools or U.S. schools of osteopathic medicine (Table 4). However, in 1993, the gap between graduates of U.S. medical schools and those of international medical schools narrowed dramatically, whereas that between graduates of U.S. medical schools and those of schools of osteopathic medicine widened. For example, in 1989, graduates of U.S. medical schools had average scores that were 3.6, 3.3, and 3.3 percentage points higher than those of graduates of international medical schools at their first, second, and third years of postgraduate training, respectively. In 1993, the differences were 1.1, 1.8, and 1.9 percentage points at each level, respectively. In 1989, the differences between the percentage scores of graduates of U.S. medical schools and those of U.S. schools of osteopathic medicine were 1.7, 1.7, and 1.6 for residents in their first, second, and third years of postgraduate training, respectively, compared with differences of 4.9, 3.5, and 2.6 in 1993.
Second, as expected, for each annual examination, the average scores were higher for residents at higher levels of training in all cohorts (Table 4 and Figure 2). For example, for graduates of U.S. medical schools in 1993, the average score for interns was 57.0; for residents in their second year, 62.9; and for those in their third year, 66.9. However, important trends are evident within each group over the past 6 years. The average scores for every cohort have declined consistently from the 1988 to 1993 examinations (Figure 2). The decrease in scores has been greatest for interns. For example, the average scores of interns on the 1989 and 1993 examinations decreased 5.0, 3.9, and 7.5 percentage points, respectively, for graduates of U.S. medical schools, international medical schools, and U.S. schools of osteopathic medicine. The decreases in scores for residents who were in their second year of postgraduate training were 4.3, 2.9, and 5.1 percentage points, respectively, in the three groups; for residents in their third year of postgraduate training, the decreases between the 1989 and 1993 examinations were 3.3, 1.7, and 5.8 points, respectively. For each level of resident training, the decrease was less clear-cut for graduates of international medical schools than that for the other two groups.
Table 5 shows the average scores of residents in their second year of postgraduate training on the 1993 examination according to the type of training program with which they were affiliated. Program types were classified according to data contained in the American Medical Association Directory of Graduate Medical Education Programs (“Green Book”), the National Study of Internal Medicine Manpower Directory of Training Programs in Internal Medicine, the roster of the Association of Professors of Medicine and by interest-group designations of the Association of Program Directors in Internal Medicine. In general, residents from programs based in military hospitals, university hospitals, and multispecialty group hospitals scored higher than those in other types of programs. However, the relation between scores and program type for an examination such as the ITE-IM is complex and multifactorial. It may reflect such factors as the background training of housestaff before residency, success in resident recruitment, preparation for the examination, the test-taking skills of housestaff, the curriculum content of the residency program, and the quality of in-training experiences.
Feedback from Residents and Program Directors
After the first four examinations, residents were asked to complete a questionnaire to assess their degree of preparation for and perceptions of the examination. A random sample of 300 respondents from the cohort of residents in their second year of postgraduate training was analyzed each year. In general, 70% to 75% of the random sample did not study at all for the examination; 10% to 20% studied fewer than 10 hours, and 10% to 15% studied more than 10 hours. More than 80% had slept 5 hours or more the night before the examination; 5% to 10% had slept 3 to 5 hours, and 3% to 5% had slept fewer than 3 hours. Some of this last group had been working as night floats on the night before the examination. At the time of the examination, approximately 60% to 65% of the random sample were on rotations with every fourth night on-call, 15% with every third night on-call, 15% with every fifth night on-call, 5% to 10% on rotations with no night duty, and 1% to 3% on night-float rotations. More than 90% felt that the time for the examination was at least adequate and more than 95% felt that the content was appropriate for their level of training. No attempt has been made to correlate questionnaire responses with test scores.
Each year, program directors are asked to complete a questionnaire when they receive the results of the examination for their residents. The questionnaire is intended to assess satisfaction with the examination procedure and to determine how test results are being used. Approximately 250 to 300 participating program directors respond to the questionnaire each year. More than 95% of program directors find the examination to be very useful (75% to 80%) or fairly useful (15% to 20%). Most program directors use the results of the examination to counsel residents with low scores and to guide residents in developing self-study programs. Program directors also use the results to focus teaching exercises on topics on which their residents have scored poorly.
Discussion
The ITE-IM allows residents to objectively compare their own level of knowledge with that of their national peer group [1-9]. Although initially targeted to test trainees at the midpoint of their residencies, the examination is now used by residents at all levels as a self-evaluation instrument. The purpose of the test is educational; it enables residents to identify specific core topics within general internal medicine in which they might need remedial work. It also permits program directors to evaluate the performance of trainees in their programs compared with that of their peers. It enables program directors, like residents, to identify specific areas of deficiency in their programs that might warrant increased emphasis in their curriculum.
The ITE-IM is a highly reliable assessment of residents' knowledge of internal medicine and their ability to recall facts. It measures their skill in solving clinical problems within the format of a written, multiple-choice examination. The examination is unique in that most examinees do not study for the test, in marked contrast to other examinations such as the certifying examinations for the National Board of Medical Examiners and the United States Medical Licensing Examiners, the Federation of Licensing Examiners, the Educational Commission foreign Medical Graduates, and the American Board of Internal Medicine. Because residents do not prepare specifically for the ITE-IM, it is more likely to reflect their “working” knowledge rather than their “learned” knowledge. Thus, it may provide both the resident and the training program with a more accurate estimate of the trainee's core knowledge than is provided by other written examinations. The ITE-IM, however, like other written examinations, is limited in that it does not measure the resident's attitude, bedside skills, relationships with patients, teaching ability, technical skills, or humanistic qualities. These characteristics are best evaluated by direct observation.
The changing demographic characteristics of residents taking the ITE-IM during the past 6 years reflects the changing composition of internal medicine programs in the United States [13, 14]. The number of examinees who are graduates of international medical schools has increased dramatically from 2000 in 1988 to 5639 in 1993, whereas the number of graduates of U.S. medical schools taking the examination has remained relatively stable. In 1993, almost 50% of the total number of examinees were graduates of international medical schools. For the past 3 years, the number of graduates of international medical schools who have taken the examination in the first year of postgraduate training has exceeded the number of graduates of U.S. medical schools at the same level (data not shown).
Comparisons of the average scores of cohorts of examinees over the past 5 years show that graduates of U.S. medical schools consistently do better than graduates of international medical schools, but the gap is narrowing. This trend may reflect an infusion of better-trained graduates of international medical schools into U.S. residency training programs, improved test-taking skills of these graduates, a decrease in the knowledge base of graduates of U.S. medical schools, diminished test-taking skills of U.S. medical school graduates, or other as-yet unidentified confounding variables. It will be interesting to follow this trend during the next several years to evaluate the effect of proposed reforms in the funding of graduate medical education on the relative quality of graduates of U.S. medical schools, international medical schools, and schools of osteopathic medicine in internal medicine training programs and to monitor their knowledge compared with that of present-day trainees.
The ITE-IM provides unique insights into the nature and quality of internal medicine trainees and training programs. Because it is prepared and administered according to a standardized protocol, because questions are targeted to test the factual knowledge of a specific group of trainees, and because it is given to nearly all residents in their second year of postgraduate training in U.S. training programs, the examination is a useful instrument to monitor trends over time. The internal consistency of the ITE-IM and reliability of scores is confirmed by statistical analyses each year. Because of these features, the ITE-IM can be used to evaluate the effect of new initiatives that have been introduced into the curricula of most residency programs over the past few years, emphasizing ambulatory care and geriatric medicine. It may also be possible to use the examination to measure differences over time among residents with diverse previous training experiences with regard to their knowledge of specific clinical topics such as the physical examination, diagnostic test selection, medical ethics, or preventive medicine practices [15-17]. Average scores of residents in a single program or a group of programs can be compared to assess the effect of geographic location, university affiliation, size, composition of trainees, new curricula, or specific educational experiences. It may also be possible to follow a cohort of trainees from a single program over 3 years to chart the improvement (or absence of improvement) in scores on subsequent examinations and compare them with other programs.
It is important to remember that the purpose of the ITE-IM is to facilitate self-evaluation and to identify topics for further study. The test gives residents and program directors reliable information about their level of competency on a written examination. It does not distinguish good residents from bad residents. Nonetheless, the ITE-IM is unique in that it may provide insight into the “working” rather than the “learned” knowledge of housestaff. A study correlating the examination scores and the program director's subjective rating of the resident's knowledge or another type of performance-based evaluation would be useful to validate this interpretation of the utility of the ITE-IM [18, 19]. The ITE-IM may prove to be as valuable a resource to medical educators as it is a useful instrument for self-evaluation for residents and training programs.
- Copyright ©2004 by the American College of Physicians
RSS Feeds











