Foreword

Featured Article:
Is there a difference between education and training?

Peter J. Fabri, MD, PhD, FACS
Associate Dean, Graduate Medical Education
University of South Florida College of Medicine
Tampa, FL

 • Previous Articles

ACS Presentations
(Resident Work Hours Issues):
Graduate Surgical Education in the Era of the 80-Hour Workweek

ACGME Program Requirements

RRC Procedures for Granting Duty Hour Exceptions

ACS Home Page

RAP ARTICLE

A Brief Guide for Program Directors on how to Assess Meaningful Changes in ABSITE Performance

Donald A. Risucci, PhD
New York Medical College

[Residency] Program Directors (PD) are often interested in assessing changes over time on the ABSITE at one or more of four levels:

1. The program as a whole (ie, Have ABSITE scores gone up since I became PD in 2001?)

2. A particular cohort (ie, Did my current PGY-4 residents improve relative to the performance they turned in as PGY-3s?)

3. A particular PGY level (ie, Did my current chief residents perform better than last year's chief residents?)

4. A particular resident (ie, Did all the extra reading pay off for resident X this year?)

This paper describes approaches that PDs can use to assess meaningful changes in ABSITE performance at each of the aforementioned levels. Statistical tests for these purposes are those commonly used to compare means (ie, t-tests, analysis of variance) and/or nonparametric statistics (ie, Mann-Whitney, Wilcoxon, Chi-Square). The statistical significance of observed changes will depend upon several factors including: whether or not subjects are paired (ie, whether or not data are being compared from the same individuals on multiple occasions); the magnitude of the observed differences; the variability in test performance among residents; the number of residents included in the analysis; and the particular statistical test used.

The American Board of Surgery (ABS) provides PDs with three types of scores describing individual resident performance on the Total Test and on both the Surgical Basic Sciences and Clinical Management portions, following each administration of the ABSITE: 1) Percent Correct, 2) Standard Score and 3) Percentile Score. ABS also provides Standard Scores and Percentile scores for each of the five sections of the exam (ie, Body as a whole, Gastrointestinal system, and etc.). Each Percent Correct score corresponds to a specific Standard Score and PGY-specific Percentile Score, for a given test administration.

It is essential to have a clear understanding of the meaning of each of the three scores and some of the subtle differences in the interpretation of the scores that must be considered when assessing residents at different PGY levels. It is also important to appreciate that some of the variability in test performance during any administration of the exam is associated with mea-surement error (ie, ambiguous items, guessing, emotional status, environmental conditions, and etc.). ABS reports that the Standard Error of Measurement for the Total Test is approximately two Percent Correct score points or 25 Standard Score points. This means that the "true score" of a resident who obtained a Percent Correct score of 60 is somewhere between about 58 and 62 Percent Correct. The "true" Standard Score of a resident obtaining a Standard Score of 550 is somewhere between about 525-575. A description of the precise meaning of each of the three scores provided by the ABSITE follows.

A. Percent Correct Score:

(Number of items answered correctly divided by the number of items on the test or subtest) x 100

Changes over time in a resident's Percent Correct score are expected to parallel knowledge acquisition and improve considerably during the course of residency training. Test items may change somewhat from year to year, so comparing an individual's Percent Correct score across PG years may reflect some differences in actual test content, but these are usually minor. Percent Correct scores tend to increase nationally by at least a few points as residents progress from one PGY level to the next.

B. Standard Scores represent the magnitude of the Percent Correct score relative to the mean Percent Correct score of the nationwide cohort of test takers regardless of PGY. A Standard Score of 500 corresponds to the mean Percent Correct score of all test takers. 100 Standard Score points represent one standard deviation (SD) of Percent Correct score points. So, a Standard Score of 600 indicates that the resident's Percent Correct score was one SD above the nationwide mean of all residents taking the test; a Standard Score of 450 is 0.5 SDs below the nationwide mean, and so on.

A Standard Score of 500 (ie, the mean of all test-takers) corresponds approximately to the Percent Correct score expected of a resident at the end of the PGY-2 level. This is due in part to the fact that the content of the examination, particularly that of the basic science items, is geared toward the beginning PGY-3 resident.

Standard Scores tend to increase nationally by an average of ~50-75 points per PGY level in the early years of training and ~20-30 points per PGY level during the senior years. Categorical residents who plan to complete general surgery training average about 50 Standard Score points higher than their PGY-level peers.

C. Percentile Scores indicate the percentage of residents nationwide at the same PGY level that obtained a Percent Correct score less than the score obtained by a particular resident.

The Percentile score of 50 corresponds to the median Percent Correct score for all test takers nationwide at a given PGY level.

Mathematical operations (ie, computing a mean) on Percentile scores can be misleading be- cause percentiles have a nonlinear relationship to Percent Correct scores. For example, the difference in Percent Correct scores between a Percentile of 10 and a Percentile of 20 (a dif-ference of 10 Percentile points) tends to reflect a much larger difference in Percent Correct scores than does the difference between a Percentile of 80 and a Percentile of 90 (also a 10 Percentile point difference).

A description of approaches for assessing meaningful changes in ABSITE performance at each of the aforementioned levels follows.

1. The program as a whole (ie, Have ABSITE scores gone up since I became PD in 2001?)

An obvious approach to assessing changes over time for the program as a whole is to compute the mean for each year in question and examine the changes in the mean. The median will be a better estimate of the program average if the program is relatively small (ie, less than 25-30 residents), and/or the distribution of scores tends to be skewed.

Percent Correct scores are probably the best choice for assessing program-wide changes but will only provide information about the change in the percent of items answered correctly by the average resident in the program from one year to the next. As mentioned previously, test questions can change from year to year, placing some limits on the interpretation of changes in Percent Correct scores. Further, changes in Percent Correct scores do not provide any infor-mation concerning the magnitude of the observed changes relative to changes observed nation-wide during the same period. Nevertheless, the Percent Correct score is particularly useful for this purpose because it is not influenced by changes in the composition and performance of the nationwide population of residents from one year to the next.

Changes in Standard scores may also be examined for the purpose of evaluating changes in the scores on the program level and can provide data representing a program's performance relative to the nationwide average each year. Increasing mean or median Standard scores may lead to the conclusion that the program's ABSITE performance is improving relative to the nationwide sample. However, it is important to remember that changes in the resident population affect scores. Chief residents leave and a relatively large new cohort of PGY-1 residents enters pro-grams each year. Other residents may enter or depart from programs for various reasons. When comparing one year to the next, some of the residents are the same, others are not. In small and medium size programs in particular, the departure of even one or two very poor or truly stellar performers from one year to the next may account for a significant change in the overall program mean. Therefore, looking at Standard scores provides a statistical summary of the program's overall mean change from one year to the next--in terms of the performance of its average resi-dent relative to that of the average resident nationwide during the particular test administration periods. However, it is important to appreciate that the Standard score approach would not address the learning of residents since only a subset of residents would be contributing data to the mean for both years, at both the individual program level and the national level.

Percentile scores should not be used to compare program level means across years for the reasons described for the Standard score approach.

In general, comparisons and trends spanning multiple years are needed before one can reliably conclude that program changes (ie, a new PD) have resulted in significant changes program-wide in ABSITE performance.

2. A particular cohort (ie, Did my current PGY-4 residents improve relative to the performance they turned in as PGY-3s?)

Analyses of changes over time for a particular cohort are valid if the analyses are limited to only those residents for whom data are available for consecutive years.

Percent Correct scores can be used for assessing a particular cohort, as in the example of a group of residents who have moved to the next PGY. But such use would only provide infor-mation about the change in the percent of items answered correctly by the average resident in that cohort from one year to the next. As mentioned previously, items on the exam do change from year to year, placing some limits on the interpretation of changes in Percent Correct scores. Further, changes in Percent Correct scores do not provide any information concerning the mag-nitude of the changes observed in the cohort relative to changes observed nationwide during the same period.

Changes in the mean or median Standard Score can be examined. If gains of more than ~50 to 75 Standard Score points are observed from PGY-1 to -2 or from PGY-2 to -3, or if gains of more than ~20 to 30 points are observed per PGY during the senior years (ie, the expected change associated with one additional PG year), one can conclude that the perfor-mance of the cohort, relative to that of their nationwide peers at all PGY levels, was greater than their relative performance during the previous year.

Changes in the median Percentile Score can also be used to compare the performance of the average (ie, 50th percentile) resident in the cohort relative to that of his/her nationwide PGY-specific peers, from one year to the next. This is particularly true when comparing residents at the PGY-3, -4, or -5 levels, since the vast majority of test takers at this level are categorical residents and the composition of the nationwide cohort, as well as the program cohort, is much more likely to remain stable at these PGY levels.

3. A particular PGY level (ie, Did my current chief residents perform better than last year's chief residents?)

The same issues apply to questions in this category as apply to those in the program as a whole category, with the added concern that the sample sizes will be even smaller.

Again, changes in Percent Correct scores are the most straightforward especially during the PGY-1 and -2 years. At the PGY-3 to -5 levels, Standard scores and Percentiles can be used if the PD is interested in performance changes relative to those observed nationally.

4. A particular resident (ie, Did all the extra reading pay off for resident X this year?)

When comparing an individual resident's performance from one year to the next using either Percent Correct or Standard scores, the PD needs to consider the magnitude of the change, relative to the change observed nationally from year to year, simply as a function of PGY.

The PD must also keep in mind the magnitude of the Standard Error of the Mean (ie, the margin of error) when interpreting changes in the Percent Correct or Standard scores. As stated previously, the ABS reports that residents in the early years of training are expected to gain ~50 to 75 Standard score points per PG year in the early years of training and ~20 to 30 points per PGY during the senior years. Percent Correct scores generally improve by about five to 10 points from PGY-1 to -2 and by about three to five points for each subsequent PG year.

SUMMARY

The Percent Correct score generally provides the most educationally relevant parameter for assessing meaningful changes in ABSITE performance over time. The primary advantage of the Percent Correct score is that it is not influenced by changes from year to year in the com-position of the nationwide cohort. Changes in Standard scores offer information about changes in resident performance relative to nationwide peers at all PGY levels combined. However, the interpretation of changes in resident performance can differ from one test administration to the next, and is different depending upon the PGY level of the resident. Changes in Percentile scores are less useful because of their nonlinear relationship to the Percent Correct score. Fur-thermore, Percentiles represent each resident's performance relative only to his/her nationwide PGY-level peer group, which changes considerably from one year to the next, particularly in the early years of residency training.

Revised February 16, 2005



Residency Assist Page
Division of Education
This page and all contents are Copyright © 2002-2005
by the American College of Surgeons, Chicago, IL 60611-3211