Reed G. Williams, PhD
Department of Surgery
Southern Illinois University School of Medicine
When teaching a course on performance appraisal to physicians in the Masters of Health Professions Education (MHPE) program at the University of Illinois at Chicago, I decided to start and end the course by attempting to summarize everything that I thought it was important to say about the topic of "performance appraisal" in a single document. I called the document my "10 commandments of performance appraisal." Since teaching the course, I have slightly modified the document as a result of questions and challenges posed by students, and colleagues, as well as my own experience. When asked to prepare this column for the ACS Residency Assist Page, I decided that the 10 commandments provided an appropriate foundation for my remarks to residency program directors and other surgeons who read the Residency Assist Page.
Since the original document was intended to cover all performance appraisal situations, there are some points that apply more to traditional classroom-based courses where objectives can be carefully planned in advance, activities can be somewhat controlled, and tests can be carefully aligned with the course objectives and activities. However, quality performance assessment in clinical settings is more of a challenge. All of the commandments are included in this document because they provide general principles to guide the planning of performance appraisal programs for educational purposes. Following each commandment I have added comments designed to aid surgery program directors and other Residency Assist Page readers in applying these principles to the competencies associated with surgical practice in surgery residency programs. When I talk about "tests" in this paper, I am talking about assessment activities designed to elicit and systematically observe key performances, judge those performances, and provide information for making inferences about how residents will perform in similar future circumstances.
1. Treat assessment time as a limited commodity.
1.1. Don't administer tests unless you are prepared to act on the results.
1.2. Sample competencies systematically and broadly.
1.3. Use the most time-efficient assessment method that is capable of assessing a competency accurately and completely.
1.4. Assess only important competencies.
Commentary
Testing often becomes a ritual in both traditional classrooms and in clinical settings. We need to guard against investing time and effort in conducting tests that are not used for appraisal. The amount of time invested in developing tests, processing test results, preparing reports, and discussing test results that we don't use would probably surprise us all if we took the time to calculate it.
The biggest problem in clinical performance appraisal is the lack of control over what aspects of resident performance are actually observed. In the first place, relatively little direct observation underlies performance appraisal. Second, in the absence of careful orchestration of evaluation efforts, some aspects of performance are over-assessed at the expense of other equally important competencies.
With regards to point 1.3., research has shown that there is a major discrepancy between physician estimates of resident knowledge and estimates based on multiple-choice tests. Multiple-choice tests provide a much broader sample of resident knowledge than do ratings by surgeons, and probably offer the most representative estimate of resident knowledge.
Residency programs should limit investment of local faculty resources in the development of multiple-choice items. The American Board of Surgery's ABSITE examination taps into the talents of surgeons on the national level as the basis for the examination. Local resources are better utilized in assessment activities designed to cover other equally important aspects of clinical competence.
2. Test application of knowledge and skills rather than simple facts.
2.1. Primarily, test the ability to use knowledge and skill to accomplish professional tasks.
2.2. Test knowledge only as a follow-up diagnostic method to determine why a person is unable to perform professional tasks in context.
Commentary
When it comes to this second commandment, residency programs are ahead of traditional (ie, basic science) courses.
3. Treat human performance as highly situation-and-case specific.
3.1. Test for tasks and situations systematically.
3.2. Don't assume that good performance in one situation is indicative of how a person will perform in other situations, even if you believe that the competencies involved are closely related. Pilots are licensed to fly 727s or 747s, they are not given unlimited pilots' licenses.
Commentary
Human performance is the area where our assessment and teaching practices are most behind what we know about clinical competency. The belief system that seems to drive medical training and assessment is one that suggests good residents are good at everything they do. The research on clinical performance suggests that clinical competence is quite variable as a result of the opportunities that residents have had, and their predisposition to learn in various areas. To the extent that this is true, it becomes important to provide direct training and experience for every important competency and situation. Furthermore, assessment should be used to confirm resi-dent competence in these areas.
4. Define the competencies to be assessed as clearly as possible.
4.1. Use a clearly articulated testing blueprint or set of behavioral objectives as a guideline for test development.
4.2. Make sure that your tests accurately mirror the test blueprint.
Commentary
The ACGME Competencies project has helped us all to clarify important competencies that need to be fostered in residency programs. Being clear on the training program objectives is half the battle when program objectives are written in terms of what residents will be able to do as a result of their residency program experience. The next taskaligning testing activities with objectivesrequires commitment and discipline. Such alignment will not happen without a belief in the importance of the competencies described as important by each program.
5. Use a variety of assessment methods.
5.1. No single assessment method is capable of measuring all important educational objectives (human performance qualities).
5.2. No score from a single assessment method is sufficiently predictive of performance on other assessment measures that are to be used as a proxy for scores derived from those measures.
5.3. Pick the right assessment tool for the job. (Faculty time, effort, and cost should be a secondary consideration to ability to assess the target competence).
5.4. Avoid the tendency to primarily test performance qualities that are easily measured.
5.5. Different appraisers focus on different aspects of performance. Use multiple appraisers.
Commentary
Objective test score results (ie, ABSITE scores) are often given more weight than deserved when making progress decisions. The assigning of heavier weight is due to the comfort associated with the properties of "objective" data (especially the ability to make direct com-parisons among residents). Whether we like to admit it or not, ABSITE scores are often used as proxy measures for competence in other less well or easily measured areas. Likewise, when selecting residents, USMLE scores are used as proxy measures to judge other less well-measured important competencies of residency applicants. There is a tendency for educational programs to evolve toward testing primarily those performance qualities that are easily measured. This is unintentional. Nevertheless, the outcomes are the same. Certain important aspects of per-formance (ie, professionalism, patient-physician communications, and intra-operative decision making) are ignored increasingly by both residents and faculty. (See also commandment 6.5.)
6. Make every assessment a learning experience.
6.1. Always give the student the maximum amount of feedback from an assessment.
6.2. Always include a teaching response as part of an assessment event.
6.3. Favor a rich multi-dimensional profile of performance, as opposed to a single grade.
6.4. Use assessment events to achieve instructional ends. Tests are an effective way to encourage the learning process, focus learning efforts, and stimulate review and practice.
6.5. Use an imprecise measure of a competence if the competence being measured is important to you, and there is an absence of a more precise measure.
Commentary
In the SIU general surgery program, we have begun a process that flies in the face of common practice: We are encouraging a distinct separation of evaluation for feedback and evaluation for academic progress decision-making. In our faculty development program, we encourage faculty to provide feedback to residents immediately after observing some aspect of performance that deserves comment (positive or negative). Specifically, we encourage the faculty member to engage in a conversation with the resident and cover the following elements in their feedback: a description of the situation, what the resident did, the consequences of that action, and sugges-tions on how to handle that situation differently in the future if the action led to undesirable con-sequences. We believe that traditional assessment system practice (ie, putting a section on the form for comments and suggestions) has unintentionally resulted in less day-to-day feedback to residents by faculty members. Unfortunately, faculty members have been given the impression that they should save feedback for the end-of-rotation evaluation form.
7. Test scores must be given meaning before they can have value.
7.1. Look beyond what the person does and learn why they do it to truly understand that person.
7.2. A person's behavior can be interpreted (understood, given meaning) by comparing their scores to:
7.2.1. A pre-determined acceptable level of performance. (This approach is consistent with the mission of educational institutions.)
7.2.2. Scores of other examinees. (This is the most common approach used but the emphasis is more appropriate for selection tests, ie, MCAT, rather than for educational achievement tests).
7.2.3. A score reflecting the person's earlier performance on the same task. (This approach is also consistent with the mission of educational institutions).
7.3. Base your decisions on knowledge of how previous persons with similar scores have fared in practice. (The decision to treat a person with a low-density lipoprotein value greater than 100 is grounded on knowledge of the associated probability of coronary artery disease and coronary events in untreated patients with similar scores. Similar knowledge is needed to make educational decisions).
Commentary
Commandment 7.1. is based on the fact that the goal of assessment almost always goes beyond evaluating the performance that has been observed. We are interested in how a resident will perform in a range of situations with a broad spectrum of patients. It is always easier to predict how a person will perform in a range of situations when you understand the reasons underlying the performance you observed. For example, Kopelow and his colleagues1 reported that physicians only collected about 60 percent of the patient data that they stipulated was absolutely essential to a competent workup for a particular patient when that patient came to their office unannounced. The fact that they collected an average of 60 percent of the required information is a measurement of their performance. To truly understand that performance and predict future related performances requires understanding why they performed in that way.
The failure of residency program faculty to take action against poorly performing residents is due to uncertainty about, and discomfort with, the available performance information. We favor multiple-choice tests for one primary reason: On these tests all examinees are faced with the same task, making it possible to directly compare examinees. When using national examinations like the ABSITE, we can compare residents to their colleagues locally and to their counterparts nationally. Historically, we have had no similar opportunity for clinical performance and profes-sional behavior competencies so direct comparisons were not possible. Objective-structured clinical performance examinations such as the Patient Assessment and Management Examination developed by MacRae, Regehr, Leadbetter and Reznick2 and the Objective-Structured Assessment of Technical Skills3 provide some of the benefits enjoyed by objective tests of know-ledge as compared to tests of clinical performance. Objective tests of performance also allow establishing acceptable performance standards in advance and comparing resident performance to these preset standards.
8. Recognize and give credit for unanticipated learning outcomes.
8.1. Overly structured assessment plans can result in failure to recognize creativity and unusual achievements on the part of learners.
8.2. Give credit for what the student has learned, in addition to recognizing achievement of the intended learning outcomes.
Commentary
Another realm where surgery residency programs and other clinical training programs should be given credit is in the area of recognizing unique learning outcomes. Subjective evaluation systems do a better job of recognizing creativity and unusual learning or performance achieve-ments than do assessment systems for more regimented traditional classes. Residents can and should be recognized for the number of research projects completed, research reports published, and the number of teaching awards received in addition to being evaluated using the traditional clinical performance appraisal form
9. Provide high quality feedback.
9.1. Give feedback that is situation specific, descriptive, and constructive. What did the learner do? In what context? Why was their performance desirable or undesirable? How could the situation be handled more effectively?
9.2. Provide feedback on a continuous, immediate basis, as feedback is necessary for learning. Instructor feedback to students is traditionally limited in amount and is concentrated at the end of the course where subsequent learning opportunities under guidance of the instructor are limited. Feedback removed from the event is also limited in quality by constraints of memory (details are forgotten, recall is selective).
9.3. Comment on the good qualities and achievements of learners, while aiming to correct their deficiencies.
9.4. Limit the amount of feedback given at any one time.
(See my comments regarding the sixth commandment).
Commentary
Feedback needs to be differentiated from praise or criticism. Feedback is information about a person's performance given with the intent of guiding future performance. Praise is normally nonspecific. When a person tells you "That was a great grand rounds presentation!", the state-ment only tells you the person liked the presentation. Feedback would tell you what the person liked or didn't like about the presentation. One of the universal complaints from medical stu-dents about clerkship performance is that they don't get enough feedback. It really isn't clear whether they want more praise or more feedback. Residency programs should work to ensure that residents get praise when it is deserved, but we should also do what is necessary to ensure that residents also get more feedback.
I would like to emphasize statement 9.4. We all have a tendency to notice and comment on many aspects of a performance because we see many ways to help the resident improve her/his performance. However, there is a limit to how much feedback a person can process at any one time. This is especially true not only of negative feedback, but also of positive feedback. Think back on occasions when people have provided you with positive feedback. This can be uncomfortable and it is almost always uncomfortable when you receive extensive positive feedback on a single occasion. Limit yourself to providing one or two observations or suggestions at one time. Pick the one or two points that you think are most important. Save the other observations and suggestions for another time.
10. Share all statements of appraisal with the learner.
10.1. Such sharing will lead you to write appraisals in a fair, descriptive, and constructive manner.
10.2. Make the student the main beneficiary of your appraisal.
Commentary
Commandment 10 provides a good place to address the issue of signed versus anonymous evaluations. My thought is that if you follow the subpoints under the 10th commandment, you will be comfortable in openly sharing the information with your residents. To residency program directors, let me express that I don't believe that anonymous evaluations will stand up in court should a progress decision end up in adjudication. At SIU, we make all progress decisions about residents by committee during the middle and at the end of the year.4 The committee process provides some anonymity and takes some pressure off individuals when negative progress decisions need to be made.
Summary
The 10 commandments of performance appraisal are intended to provide comprehensive, general advice on designing and evaluating assessment systems, and apply equally well in both traditional course and clinical settings. If you found them helpful, you may want to consider evaluating your current residency assessment program using the 10 commandments of assessment, and the subpoints listed under each, as a checklist to guide the process.
References
1. Kopelow ML, Schnabl GK, Hassard TH, Tamblyn RM, Klass DJ, Beazley G, et al. Assessing practicing physicians in two settings using standardized patients. Acad Med 1992;67(10 Suppl):S19-21.
2. MacRae H, Regehr G, Leadbetter W, Reznick RK. A comprehensive examination for senior surgical residents. Am J Surg 2000;179(3):190-3.
3. Martin JA, Regehr G, Reznick R, MacRae H, Murnaghan J, Hutchison C, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 1997;84(2):273-8.
4. Schwind CJ, Williams RG, Boehler ML, Dunnington GL. Do individual attending post-rotation performance ratings detect resident clinical performance deficiencies? Acad Med 2004;79(5):453-457.