NBME logo
Date Updated: June 30, 2014

Grants Awarded

Open or close all grant information

2013–2014 Grantees

The University of California-David School of Medicine

Principal Investigator: Dr. Mark Henderson
Grant Amount/Duration: $149,999.72 for two years
Project Title: California Longitudinal Evaluation of Admission Practices (CA-LEAP) Study

Learn More...

The University of California, Davis School of Medicine seeks to lead a consortium of all five University of California medical schools to evaluate medical school interview outcomes. The California Longitudinal Evaluation of Admission Practices (CA-LEAP) consortium will initially study the degree to which traditional and Multiple Mini-Interview (MMI) methods predict medical student performance. CA-LEAP is uniquely positioned to explore interview methods and outcomes given the large number and diversity of applicants interviewed across the five campuses. The study will directly address the goals of the Stemmler Fund as it tackles a critical assessment issue-how to best select the next generation of successful physicians.

Dalhousie University

Principal Investigator: Dr. Joan Sargeant
Grant Amount/Duration: $149,909 for two years
Project Title: Testing an evidence based model for facilitating performance feedback and improvement in residency education: what works and why?

Learn More...

The goal of this study is to explore in residency education, the use of the evidence-based model for providing facilitated performance feedback, to enhance feedback acceptance, inform self-assessment and self-monitoring, and guide performance improvement through coaching.

Objectives are to:

  1. Determine which components of the feedback model and which supervisor activities are effective in engaging the resident and promoting feedback acceptance and plans for use, under which conditions and why.
  2. Determine the influence of the use of the model upon enhancing:
    • Relationship and rapport building between the supervisor and the resident
    • Residents’ acceptance of feedback and assimilation with their own self-assessment, especially when the feedback disconfirms their own views
    • Residents’ understanding of both the content of their feedback and the standards or milestones against which they’re being measured
    • Residents’ ability to set learning and improvement goals and develop and implement realistic plan to address these, and assess this plan
  3. Determine the effectiveness of the supervisors’ development workshop in preparing them to use the phases of the feedback model effectively with residents receiving both confirming and disconfirming feedback.
  4. Finalize the feedback model for residency performance feedback, and the supervisor development workshop.

The Brigham and Women’s Hospital, inc/ Harvard Medical School

Principal Investigator: Dr. Steven Yule
Grant Amount/Duration: $150,000 for two years
Project Title: Assessing surgical residents’ non-technical skills, Validation of the NOTSS taxonomy and implementation of a national curriculum

Learn More...

The goal of this research project is to develop an observational tool for the assessment of surgical residents’ non-technical skills. By revising assessment tool (NOTSS: non-technical skills for surgeons) for use with surgical residents in the US and testing this tool for reliability and validity, we will be able to gather for the first time data on baseline non-technical skills of surgical residents across the US at different stages of training. This will then allow us to assess the impact of non-technical skills education on behavior in the OR, acquisition of technical skills, patient safety and progression through training to independent practice.

2012–2013 Grantees

Harvard Medical School

Principal Investigator: Dr. Edward Krupat
Grant Amount/Duration: $149,657 for two years
Project Title: : Academic and Professional Lapses in Medical School: An Prospective Assessment of their Sequelae.

Learn More...

We believe that by establishing academic and professional problems in medical school as the starting (rather than end) point of our study, operationalized as appearances before the school's promotions board, we can fulfill the objective of understanding the relationship between unprofessional behaviors in medical school in a manner that has actionable implications. That is, in addition to establishing the relationship between promotions-board appearance and future unprofessional behaviors, this research will help us identify individual characteristics, orientations, and behaviors that are associated with difficulties in medical school as well as future difficulties, thereby accomplishing the critical objective of enabling educators to identify in real time students who are most "at risk" for future problems and lapses as practicing physicians.

2011–2012 Grantees

The University of Iowa

Principal Investigator: Dr. Donald Anderson
Grant Amount/Duration: $150,000 for two years
Project Title: Simulation Approaches for Training in Fluoroscopically Guided Orthopedic Trauma Surgery.

Learn More...

The long-term goal of this line of research is to replace portions of apprenticeship-based skills acquisition in orthopedic surgery with more effective and safer simulation-based training methodologies. As a first step, we recently developed an innovative physical simulation of limited percutaneous articular fracture reduction. Using static fluoroscopic guidance, trainees reduce the simulated fracture through a limited anterior window in the housing. Sessions are videotaped for later evaluation, and trainees are assessed on time-to-completion, hand movement (tracked using an optoelectronic motion capture system), and objective quality of the obtained fracture reduction (from 3D laser scans of the final configuration). The objective of the proposed study is to improve the effectiveness and generality of this procedural training and assessment tool. We will develop new task-based modules that enable more deliberate practice in requisite constituent skills, starting first with fluoroscopic guidance. The proposed research will develop and demonstrate the value of innovative new simulation techniques for orthopedic surgical training that will lead to improvements in the care and safety of patients. This research will lay the foundation for innovative and robust skills assessment capabilities, an outcome that will provide incentive for accrediting bodies such as the American Board of Orthopedic Surgery to adopt these objective methods to assess orthopedic surgical skill.

The University of British Columbia

Principal Investigator: Dr. Kevin Eva
Grant Amount / Duration: $78,306 for two years
Project Title: Rater Cognition as Categorical Judgments: Using “Person Models” to Understand Rater Error.

Learn More...

The primary objective of this research is to better understand rater error in the context of performance-based assessments by exploring the cognitive processes underlying these judgments. In medical education, many of the efforts to reduce error in rater-based assessments have focused on practical mechanisms to minimize rater idiosyncrasy. However, there have been relatively few efforts to develop explicit theoretical models to explain the sources of this idiosyncrasy. Mohr and Kenny have demonstrated that a significant portion of the error variance in ratings of people can be explained by raters’ use of Person Models (ad hoc “stories” about the target person that serve to structure interpretations of behaviours).They found that although each rater produced a slightly different story about the target, all rater stories for a given target could be lumped into a small number of conceptually distinct Person Models. We suggest that if these findings were to generalize to rater-based assessments it could have important implications for the study of this type of assessment in medical education and beyond. The proposed project will investigate the explanatory power of conceptualizing rater-based assessment as a process of impression formation that involves raters making nominal judgments about ratees (the ad hoc creation of Person Models).

The Ohio State University

Principal Investigator: Dr. Douglas Danforth
Grant Amount / Duration: $149,862 for two years
Project Title: Virtual Patients Simulations to Assess Data-Gathering and Clinical Reasoning.

Learn More...

The principal goal of our project is to develop an innovative method to assess information gathering, critical thinking, and clinical reasoning through the use of virtual patient simulations. Virtual Patients (VP) are avatar representations of human standardized patients from which students can take a medical history. The student-VP interaction occurs in a virtual environment that simulates a real world patient encounter. Our primary goal is to provide students with a realistic virtual clinic environment and high fidelity virtual patients to sharpen their data gathering and clinical reasoning skills, and to receive constructive feedback in a risk-free environment. We anticipate that this method may be a less costly and more efficient approach to gain early clinical skills prior to scheduling time with live standardized patients in a clinical or simulated setting. Indeed, enhancing early skills acquisition with virtual patients may facilitate more effective and efficient utilization of more costly standardized patient encounters. Interviews conducted using VPs are captured in their entirety, allowing systematic analysis of the student-patient interaction, including questions asked of the patient. The primary goal of the present application will be to develop and use these innovative simulations to assess student competency in data gathering and clinical reasoning skills.

2010–2011 Grantees

McMaster University

Principal Investigator: Dr. Kelly Dore
Grant Amount / Duration: $147,419.80 / 2 years
Project Title: Ensuring Diversity and Test Security: An examination of the reliability, validity, feasibility, & acceptability of the Computer-based Assessment for Sampling Personal Characteristics (CASPer) in diverse populations of medical school applicants.

Learn More...

Personal characteristics of tens of thousands of applicants to medical schools can now be assessed with reliability and validity. Widespread testing can be done on a psychometric basis. Taking advantage of the principles of multiple sampling (OSCE, Harden 1979) and open-ended responses (postgraduate trainee ethics and decision?making, Ginsberg 2000), a computer?based test demonstrated high reliability (G>0.80) and predictive validity for subsequent multiple miniinterview (MMI) performance (R>0.60) (Dore 2009). Performance on MMI in turn predicts for personal characteristic performance (ethical reasoning, clinical decision?making, clinical skills, among others) on national licensure examinations. The MMI can only be administered to those who come onsite to interview, therefore most applicants to medical training are selected for interview without the benefit of a psychometrically sound assessment that represents the holistic nature of assessment, including professional and personal qualities.

University of California, San Francisco

Principal Investigator: Dr. Sandrijn van Schiak
Grant Amount / Duration: $150,000 / 2 years
Project Title: Developing a Tool for Assessing Individual Interprofessional Teamwork Skills across Clinical Settings

Learn More...

The goal of the proposed project is to clarify constructs of interprofessional teamwork in low-acuity inpatient and outpatient clinical settings and to develop a tool for the assessment of the teamwork skills of individual team members. With the growing emphasis on interprofessional approaches to health care, medical, nursing, pharmacy, and allied health professionals need to become proficient in teamwork. Since competency based assessment is becoming the norm in health care education, there is a clear need for a robust tool to assess the teamwork skills of individual team members. There is a paucity of teamwork assessment tools that have been validated for health care professionals, and existing tools all focus on teamwork as it occurs in high-acuity settings, such as operating rooms, emergency departments and intensive care units. Teams in lower-acuity inpatient settings and outpatient settings have different membership, tasks, and processes, and tools developed for the high-acuity settings are unlikely to translate well.

We propose the development of a valid assessment tool for interprofessional teamwork skills that is relevant for all team members. Our project has three specific aims: (1) to specify and define constructs for effective teamwork, focusing on the skills required of individual team members to achieve effective teamwork; (2) to develop a teamwork skills assessment tool and establish content validity; and (3) to establish the validity and psychometric properties of the tool.

The project's primary methods include a qualitative approach with direct observations, focus groups, and interviews to define teamwork constructs and skills (Specific Aim 1). For this purpose, we have chosen two model interprofessional teams with members from a variety of professions: an inpatient pediatric team and an outpatient women's HIV clinic team. These teams are well established at our institution and meet on a regular basis, allowing for direct observation of teamwork skills. Based on the analysis of the qualitative data, we will develop a draft tool for review by an expert panel (Specific Aim 2). The panel will rate the relevance of each item on the draft tool and we will calculate the content validity index (CVI) and symmetric confidence interval (ACI) for each item. Next, we will pilot test the developed assessment tool to establish validity of the instrument (Specific Aim 3). We will use the tool to assess teamwork skills of undergraduate learners who participate in an existing simulated clinical exercise designed to emulate interprofessional teamwork. Teams of health care professionals will be invited to go through the same exercise using the tool to assess teamwork skills. We will look for four sources of validity evidence: response process, internal structure, relationship to other variables, and consequences. To this end, we will utilize a variety of statistical analyses, including confirmatory factor analysis, a Generalizability study (G-study) to measure the variance, a Decision study (D-study) to determine the appropriate conditions under which to administer the assessment, and known-group comparisons to obtain evidence that the scores represent the different skill levels in the developmental continuum.

The proposed project will advance our understanding of behavioral assessment in general, and in particular as it pertains to interprofessional teamwork. In addition, it will provide the health care professions with a valid tool for assessment of teamwork skills.

University of Toronto

Principal Investigator: Dr. Shiphra Ginsberg
Grant Amount / Duration: $91,975 / 2 years
Project Title: Towards a Predictive Model of Residents' Performance: Exploring a Novel Method for Analyzing Written Comments on Residents' ITERs

Learn More...

The overall goal of this study is to explore a novel approach to analyzing and utilizing the language that attending physicians use when evaluating their residents, in order to develop a mechanism for efficiently quantifying their comments without losing their often nuanced qualities. Specifically, our research questions will explore the degree to which Program Directors' ratings and categorizations of their PGY3 residents in Internal Medicine can be predicted by: 1) Residents' PGY1 and PGY2 ITER scores; 2) Attending physicians' ratings of the comments received by residents on their PGY1 and PGY2 ITERs; 3) Scores generated by two commercially available software packages that analyze "affective language" in passages of text. The first phase of this project will generate a predictive model of resident performance. In phase two the model will be prospectively tested in two new groups of residents at two schools.

2009–2010 Grantees

Beth Israel Deaconess Medical Center

Principal Investigator: Dr. Amy Ship
Grant Amount / Duration: $150,000 / 2 years
Project Title: Test of Accurate Perception of Patients' Affect: Development and Validation (TAPPA)

Learn More...

One area that has received very little attention in research on physician-patient interaction is the physician’s accuracy in perceiving the patient’s affective states. This is an unfortunate gap because both the socio-emotional and instrumental goals of medical care are served by an ability to understand the feelings and interpersonal attitudes that are expressed by patients nonverbally as well as verbally in indirect ways. Although medical educators would agree that perceiving patients accurately is an important clinical skill, clinically specific tools for assessment, training, and research need to be developed. We propose to adapt methods from social psychology to measure such accuracy in creating a new standardized test called the Test of Accurate Perception of Patients’ Affect (TAPPA). The TAPPA will be modeled on the well validated empathic accuracy (EA) paradigm (Ickes et al. 1990; Ickes 2001), which will be adapted for the medical care context. The test will consist of a series of videotape clips of patients taken from real medical visits, and for each clip the test-taker will make a judgment about the patient’s feelings (affective state). To determine whether one person accurately perceives another person, there must be a criterion against which such perceptions can be scored as right or wrong. In the proposed adaptation of the EA paradigm, the criterion consists of these same patients’ reports of their feelings at different moments during their medical visits, as ascertained during review of the videotape immediately after their visit. Once the clips are selected and assembled into test format, new viewers—clinicians or clinicians in training whose accuracy is to be assessed—can then be tested for accuracy of their perceptions as determined by the match between their judgments and the patients’ selfreported affect. After application of psychometric and construct validational methods, the TAPPA can be used in assessing and training clinicians as well as in research on physician-patient interaction. The present proposal covers test development, four preliminary validation studies, and a training study. Specifically, we will: (1) create an archive of videotaped material from which test excerpts will be selected (archive will consist of 24 videotaped physician-patient encounters, each with the patient’s selfreported affect as reported during videotape review); (2) apply pretesting and psychometric methods to choose videotape excerpts to comprise the TAPPA, which will consist of four 10-minute test modules each containing 12 excerpts of patient communication, which can be combined to form subtests of varying lengths (from 12 to 48 items); (3) conduct validation analyses of the TAPPA using student and clinician testtaker groups; (4) evaluate the effectiveness of a brief training intervention on the skill measured by the TAPPA. Because the test stimuli are real patients, the TAPPA will be ecologically valid and therefore likely to be accepted in the medical context; and because it will be easy to administer and score, it will be highly transportable. It can also be used as a vehicle for discussion, feedback, and reflection in addition to its primary function as a psychometric instrument. The TAPPA is hoped to be a valuable tool for assessment and research on a crucial clinical skill.

University of Illinois at Chicago

Principal Investigator: Dr. Rachel Yudkowsky
Grant Amount / Duration: $149,581 / 2 years
Project Title: The Impact of Clinically Discriminating Clinical Findings on the Validity of Performance Test Scores

Learn More...

High quality checklists are essential to the validity of performance test data. The most common methods of checklist construction are based on discussion among faculty, resulting in checklists that assess the thoroughness of an examinee's investigation of a chief complaint. There are several problems with this approach that could compromise the validity of the assessment: Items are rarely evidence based; checklists do not discriminate between experts and novices; and there is little consensus on relevant items. Furthermore, rote performance of thoroughness checklists has negative consequences on learning, by discouraging students' attention to the differential utility of findings that clinically discriminate between competing diagnostic hypotheses. An alternative method of checklist construction is suggested by the results of the 2005-06 Stemmler-funded study entitled Validation of a Hypothesis-Driven Physical Exam Assessment Procedure that found that basing performance checklists on physical exam items that clinically discriminated between competing diagnostic hypotheses (e.g. auscultating the lungs to help discriminate between asthma and CHF) resulted in more reliable (generalizable) scores compared to traditional thoroughness checklists. The purpose of the proposed study is to explore the impact of using checklists built from clinically discriminating history and physical exam items only, compared to traditional thoroughness checklists, on the validity of a standardized patient (SP)-based Clinical Skills Examination (CSE) for fourth-year medical students. Five types of validity evidence will be gathered: content, response process, internal structure, relationships to other variables and consequences. It is our hypothesis that validity evidence will favor clinically discriminating checklists and support the further development of these checklists for assessments of clinical skills.

To test this hypothesis faculty will modify six existing SP cases and checklists, describe competing diagnostic options for each case, and identify a subset of the checklist items that clinically discriminate between the competing diagnoses. All six cases will be administered to fourth-year medical students during their summative CSE. CSE scores based on the thoroughness checklists vs the clinically discriminating checklists will be compared to determine whether limiting the checklist to clinically discriminating items will:

  1. Provide a more content-relevant sampling of task behaviors (Validity evidence: content);
  2. Allow for more accurate and reliable ratings by SPs (Validity evidence: response process);
  3. Result in improved psychometric indices, as measured by (1) better item discrimination, (2) lower Standard Error of Measurement (SEM), (3) higher scale reliability (Coefficient Alpha), and (4) higher Generalizability across cases (Validity evidence: internal structure);
  4. Result in higher correlations with related variables such as experts' global ratings of the encounters, experts' global ratings of post-encounter notes, and students’ pass/fail status on USMLE Step2CS exam (Validity evidence: relationship to other variables); and
  5. Improve the quality of cut scores (Validity evidence: consequences).

This study has broad implications for the construction of performance checklists for the assessment of the clinical skills of medical students and other health professionals. By exploring the impact of limiting checklists to items that clinically discriminate between competing diagnoses, this study offers an alternative method of checklist construction that may improve the validity and educational effectiveness of standardized patient-based performance tests.

University of Washington

Principal Investigator: Dr. Florence Sheehan
Grant Amount / Duration: $150,000 / 2 years
Project Title: Objective Quantitative Competency Test for Diagnostic Ultrasound

Learn More...

Objectives: We will develop a hardware and software Skills Assessment System (SAS) to enable objective and quantitative assessment of competence in ultrasound, and we will measure the construct and concurrent validity of this SAS for competency testing.

Rationale: The problem we address is that knowledge is easy to measure using written and/or oral exams but skill competence has always been more difficult to assess in a manner that is objective and reproducible between instructors. Instead, certification has traditionally been based on duration of exposure and number of procedures.

Method: The proposed SAS displays images from an Image Library on a computer as a mock transducer is manipulated on a mannequin. Skill in image acquisition will be quantified in terms of the spatial location of specified views in three dimensions (3D) relative to the anatomy. Skill in image interpretation will be assessed using studies from the Image Library from patients with diverse diagnoses.

Innovation in Medical Education: The SAS that we propose will enables, for the first time, assessment of skill-specific competencies in ultrasound in a manner that is both quantitative and completely objective. This is an advance from subjective ratings by faculty in the Objective Structured Assessment of Technical Skills.

2008–2009 Grantees

University of Michigan

Principal Investigator: Dr. Linnea Hauge
Grant Amount / Duration: $149,963 / 2 years
Project Title: Assessing Medical Student Performance on Phone Consultations with Nurses: A Validity and Feasibility Study

Learn More...

Study objectives: The ability to effectively function as a healthcare team member is now regarded as a prerequisite skill for all medical professionals. One of the challenges of an intern’s first days as a physician is taking call, and responding to nurses’ pages about patients in need of immediate attention. Despite the anxiety associated with this universal responsibility, little attention has been paid to trainees’ preparation to safely and effectively perform this task. The purpose of our study is to evaluate the validity and feasibility of an instrument for assessing the interpersonal skills and clinical decision-making in a common physician task that has unique communication requirements, phone consultations with nurses.

Rationale: Interprofessional communication is a critical aspect of safe team function, and general assessments of communication do not readily capture the nature of specific team interactions. A validated instrument designed to measure the communication and decision-making involved in phone consultations about patients will afford us the opportunity to conduct competency assessment and standard-setting in this important patient care skill.

Methods: Simulated surgical cases for physician-nurse consultations, or mock pages, will be developed by a multi-disciplinary team of physicians, nurses, and educational specialists. Each mock page will include a two-part evaluation: one part specific to the case, and another which will be general to phone communication with a nurse. The evaluation instruments to be refined were adapted from the literature and have been used in an earlier pilot project. The assessment will be composed of 10 standardized cases, implemented and evaluated by a nurse. Approximately 140 senior medical students (70 per year) and 30 surgery residents from 5 institutions will be given pre-determined times to be on call. A nurse from the paging team will page and present participants with a hypothetical surgical scenario, and evaluate their performance. A random sample of one-third of the calls will be independently evaluated by two raters, a nurse and a surgeon, from the audiotapes of those case performances. In Year 2, a subset of student participants will be selected for follow-up in the clinical setting. Ratings of their in situ phone consultation performances will be gathered from nurses. Analyses will be conducted on raters’ scores to determine interrater reliability and internal consistency. Construct validity will be assessed by comparing student (novice) and senior resident (experienced) performances on each case. Predictive validity will be determined with a correlation between Year 1 student performance and their performance as a surgery intern. A standard-setting method will be employed to identify performance standards for determining competency on phone consultations. Feasibility will be studied via cost analysis of hours of time spent and equipment costs.

Expected outcomes: The validation of an instrument to measure performance during phone consultations with nurses will be useful in determining medical student and resident competence in this important skill. In addition to its value as an assessment instrument, its use would serve to enhance the specificity of feedback about physician communication. The feasibility of our assessment strategy will be studied, to inform the possible extension to disciplines beyond surgery. The expected outcome of our research is to provide a feasible and worthwhile model for educational interventions and competency assessment that could be applied across disciplines.

Vanderbilt University

Principal Investigator: Dr. Joshua Denny
Grant Amount / Duration: $150,000 / 2 years
Project Title: Automated Assessment of Clinical Portfolios to Determine Geriatric Competency for Medical Students

Learn More...

Rationale: National accreditation bodies including the Accreditation Council for Graduate Medical Education (ACGME) and the American Association of Medical Colleges (AAMC) have called for competency based curriculum and assessment models for training programs. Many medical schools have responded with education portfolios to capture student experiences in real and simulated environments matched to shared competency goals. However, education portfolios have not achieved widespread adoption, partially because current methods require significant manual entry of a limited amount of clinical data. Automatic and valid methods of capturing the richness of students’ clinical experiences are needed. In this project, we will apply advanced informatics techniques to a complete repository of medical students’ clinical notes to automatically locate students’ experiences and assess student proficiency according to the AAMC’s medical student geriatrics competencies.

Project objectives:

  1. Develop descriptive anchors that express the AAMC geriatric competencies as a national standard for medical education.
  2. Develop and validate automated tools to identify a student’s experience with a given competency based on student notes.
  3. Develop and validate automated tools to assess student proficiency based on student notes.
  4. Compare the student clinical note representation of geriatric experience and proficiency to external validations using standardized patient interactions and a validated medical knowledge exam.

Methods: Our project leverages two existing transformational systems. We have developed an automated concept indexer that uses natural language processing techniques to map free-text clinical notes to standardized terminologies. The second system, Learning Portfolio, collects all student notes from patient encounters as they are created in the electronic medical record (EMR). We will apply existing algorithms to accurately locate all medical concepts in the clinical notes, separated by document section (e.g., “chief complaint” and “past medical history”). We will develop sets of descriptive terms that represent geriatrics competencies in clinical notes through literature review and local and national collaboration with geriatric educators. We will identify exposures to each geriatric competency using these descriptors, and then rank proficiency with which the competency was addressed (“advanced”, “intermediate”, or “novice”). We will internally validate the algorithms by comparing the computer ratings of notes to a gold standard of physician review of the notes. We will externally validate the algorithms using student completion of observed structured clinical examinations and a validated geriatrics knowledge exam.

Significance for medical education assessment: This project will produce a tool that automatically rates clinical notes according to national geriatric competencies, and will provide an evaluation of the role of clinical notes in competency assessment. This work will establish criteria for the application of the AAMC geriatric competencies at other institutions through a set of descriptive terms that could be applied to other sets. This work introduces new methods to identify and rate performance outcomes that can be applied to other topics, competencies, and modalities of assessment, such as reflections, narrative evaluations, essays, and other written work.

2007–2008 Grantees

University at Illinois at Chicago

Principal Investigator: Dr. Alan Schwartz
Grant Amount / Duration: $149,310 / 2 years
Project Title: Measuring Quality of Medical Student Performance at Contextualizing Care

Learn More...

Clinical decision making requires two distinct skills: the ability to classify patients' conditions into diagnostic and management categories that permit the application of "best evidence" guidelines, and the ability to individualize or - more precisely - to contextualize care for patients whose circumstances and needs require variation from the standard approach to care. Most assessment in medical education places heavy emphasis on biomedical decision-making with little emphasis on how to incorporate contextual factors that may be essential to planning patients' care.

The goal of this project is to demonstrate and provide validity evidence for an innovative standardized patient (SP) method of assessing medical students in the clinical years on their ability to detect and respond to individual contextual factors in a patient encounter that overcomes the aforementioned challenges. As such, the project is designed to directly address the Stemmler Fund goal of research and development of innovative assessment approaches to enhance the evaluation of those preparing to practice medicine.

During the project, 144 fourth-year medical students participating in a Medicine sub-internship will be randomized to an intervention group or a control group; the intervention group will receive additional training in the application of qualitative methodology to elicit and incorporate contextual factors in the clinical encounter. All students will participate in an SP assessment consisting of four SPs, blinded to trial arm, presenting cases with and without important biomedical and contextual factors in a counterbalanced factorial design. Performance will be compared between trial arms. In addition, performance will be compared with USMLE Step 2 clinical knowledge scores to determine whether contextualizing ability is independent of clinical knowledge, and consistency of performance across individual SP cases will be studied to determine the number of cases necessary to achieve sufficient reliability for the assessment to be used.

The outcomes of this project, which will be widely disseminated to permit replication at other medical schools, will include: (1) a well-documented method for developing SP assessments designed to test the ability of a trainee to contextualize care, (2) evidence for the ability of the assessments to distinguish between trainees with differing levels of skill in contextualization using a randomized controlled educational trial, (3) evidence that assessment scores are not predicted by clinical knowledge, and (4) evidence for internal consistency in scoring of the assessments.

University of New Mexico Health Sciences Center

Principal Investigator: Dr. Teresita McCarty, MD
Grant Amount / Duration: $150,000 / 2 years
Project Title: A Web-based Program for the Deliberate Practice and Formative Assessment of Writing Patient Notes

Learn More...

Rationale: Research into effective approaches to significant learning has generated two powerful descriptive models: formative assessment and deliberate practice. Although the two models approach student learning from somewhat different perspectives, the congruence between them is striking. Unfortunately, realizing the full power of either model requires significant manpower, time, and expertise - resources that are difficult to achieve and to sustain. Technological approaches can reduce the human resource requirements and make the learning benefits of these models more accessible for medical educators and their students.

Objectives: In support of the overarching goal of improving medical students' clinical reasoning skills, this proposal specifically aims to evaluate the core, shared strengths of the formative assessment and deliberate practice models as implemented through a resource-sparing, web-based technology - Calibrated Peer Review. The objectives are to answer the following questions. While using Calibrated Peer Review do learners:

  1. Focus on improving performance in the well-defined task of patient note-writing?
  2. Recognize and give informative feedback?
  3. Incorporate feedback into iterative practice to improve performance?
  4. Report that the new learning is integrated successfully into new performances?

Methods: This proposal begins with archival data from patient notes written in Calibrated Peer Review T by four medical student classes (2005-2008) and continues gathering note-writing data, along with survey and focus group information, from two additional classes (2010 & 2011). Four studies are proposed. 1) The archival study evaluates individual factor correlations using the scores from 3,468 completed "assignments," as well as trends in score deviation of the peer review and self-assessment over time. 2) The quality of feedback study codes students' narrative feedback to analyze quality trends over assignment iterations. 3) The perceptions of feedback study asks students' opinions about sample feedback of varying quality. 4) The student survey and focus group study constructs questions to assess students' habits in preparation for note-writing during the performance examinations, their perceptions of the CPR system, and how it aids their learning. Descriptive statistics, inferential statistics, qualitative analyses, and analysis of change via Hierarchical Linear Modeling, as relevant to each research question, will be conducted.

Significance for Assessment in Medical Education: This web-based program applies the principles of formative assessment and deliberate practice to provide a powerful learning experience for students. It emphasizes the importance of the learner's focus on improving performance, accurate observation and informative feedback, repetition, and the formation of cognitions that reflect the true complexity of the clinical task and thus bring the learner to integrate that improvement into his or her everyday work. The judicious use of this technological approach reduces the intensive resources required for effective learning - a true innovation in 'assessment as learning' in medical education.

Duke University

Principal Investigator: Dr. Jeffrey Taekman, MD
Grant Amount / Duration: $149,297 / 2 years
Project Title: Standardized Teamwork Skills Assessment: Feasibility, Reliability, and Validity.

Learn More...

Background and Rationale: Successful delivery of health care relies on effective team coordination and communication. A major shortcoming of the recent push toward greater education of teamwork skills in medicine is our limited ability to assess the effectiveness of different forms of team skill training with respect to behavior change. There is a critical need to develop valid, reliable, and feasible methods of assessing health care team coordination skills. A number of observer-based rating tools have been implemented, with limited success. While some successes have been noted in the case of specialty-based tools for specific work environments, general team skill assessment tools intended to apply to a broader audience, or for the purpose of medical and nursing education, have not been strongly supported. Limitations of current assessment methods include: (1) a failure to attain high levels of inter-rater agreement with observer-based team skill rating scales, and (2) a failure to adequately assess teamwork skills with respect to managing difficult situations or difficult team members.

Objective: Our long-term goal is to develop, validate, and share a standardized team skill assessment (STSA) tool focused on evaluating critical health care teamwork behaviors. We have developed a STSA tool that embeds trainees with standardized team members (actors playing the roles of health care team members, similar to the accepted practice of using standardized patients) in difficult teamwork scenarios. Scoring of performance is observer-based rating of scenario-specific ideal team skill behaviors. This tool overcomes limitations of current assessment methods by (1) assessing trainees in a standardized scenario that is the same for each trainee (not dependent on the participation of other trainees) (2) using scenarios that stress critical teamwork skills in difficult scenarios, and (3) basing scoring on specific observable behaviors that are easily identified, within the context of each scenario. The primary objective of this project is to assess the feasibility, reliability, and validity of this STSA.

Methods: We will use the STSA tool to assess 30 medical and nursing students before and after teamwork training content is delivered in their capstone courses. We will assess reliability, validity, and generalizability through multivariate generalizability analysis of behavioral ratings by standardized team members (immediately following interaction) and trained observers (from videotape). Generalizability analysis is a statistic useful for assessing the reliability of a measure when there are multiple sources of variability in the measure. We will assess the proportion of variance in team skill ratings attributable to: (1) trainee, (2) pre- vs post-training, (3) rater type (actor/live vs. observer/tape), (4) medical vs. nursing student, (5) rater, and (6) scenario. We hypothesize a large proportion of variability in student scores attributed to trainee and pre- vs. post-training. We expect a low proportion of the variability attributed to rater and rater type. We will predict the number of repetitions (e.g., number of scenarios) required to achieve a reliable team skill score. We will also assess feasibility and validity of the STSA through subjective surveys of clinician-teachers, trained observers, standardized team members, and students.

Advancing assessment in medical education and practice: This research is innovative because it presents a novel approach to assessing team skills and a rigorous method of analyzing the feasibility, reliability, and validity of that approach. We expect the results of this effort to advance knowledge in medical education assessment as it relates specifically to measuring health care teamwork skills. This includes advancement in knowledge related to details such as scenario presentation, actor involvement, and rater training for the purposes of low fidelity simulation-based assessment. We also expect to advance knowledge related to methods of analyzing the validity, reliability, and generalizability of these types of assessment tools.

2006–2007 Grantees

McMaster University, Hamilton, Ontario

Principal Investigator: Kelly Dore Banks, PhD (ABD)
Grant Amount / Duration: $145,870 / 2 years
Project Title: The evaluation of the reliability, validity, feasibility, and acceptability of a web-based instrument to measure professional qualities in medical school applicants

Learn More...

Rationale: Health professions’ admission committees across the world are faced with the difficult task of selecting, from among many eligible applicants, the select few who will be admitted to their training programs. The determinants of this admissions process are often a combination of cognitive measures, such as Grade Point Average (GPA) or standardized tests such as the Medical College Admission Test (MCAT) and non-cognitive measures, including interviews and essays. However, there has been limited success in the development of evaluation tools that will provide reliable and valid measures of an applicant’s non-cognitive qualities. The exception to this is the MMI, in essence an admissions OSCE. The MMI has been shown to predict intramural and licensing examination performance. However, like any OSCE the MMI has practical limitations; the sheer volume of candidates for many institutions makes it necessary to develop a reliable and valid strategy for screening candidates’ non-cognitive attributes in a more efficient fashion. To this end a new measure, using video scenarios and written or audio responses was developed and a pilot study was completed. In 2006, 110 applicants to McMaster’s medical school completed this Computer-based Multiple Sample valuation of Non-cognitive Skills (CMSENS). Of those applicants, 78 completed the CMSENS by verbally recording their responses in an audio file while 32 typed their responses. The overall test generalizability was .86 for the audio CMSENS, and .72 for the written. The written CMSENS also demonstrated predictive validity, correlating with the MMI at .51. However, conclusions from this study are limited because of the small sample and the one-time nature of the findings.


  1. Assess the impact of varied test time and proctoring on the reliability and validity
  2. Determine the predictive validity of CMSENS
  3. Determine the reliability of the pilot results with a larger, more diverse sample

Methods: To achieve the first objective, applicants to McMaster’s medical school will be invited to participate in the CMSENS in winter 2008. These applicants will complete a CMSENS in which the length of response time and proctoring will be manipulated. In addition to reliability and validity analysis to determine the optimal testing format, participants invited to interview will have their scores compared to MMI performance. To assess objective 2, several methods will be followed. In-program performance results will be assessed for the about 25 members of the medical class of 2009 who participated in the CMSENS pilot project and gained admission. In addition, construct validity will be examined by recruiting students in their final year of medical school and second year residents to participate in a mock-CMSENS, thereby allowing a comparison between (a) scores assigned to the medical school applicants and more senior trainees, and (b) for the final year students and residents, comparison between scores assigned on the CMSENS and those received on the Canadian qualifying examinations; specifically Part II of the MCCQE (an OSCE) which evaluates both cognitive and non-cognitive characteristics of medical trainees. To satisfy the 3rd objective, a sample of applicants applying in the winter 2009, will be administered the CMSENS. A larger sample in this year will permit accurate assessment of reliability and facilitate comparison of CMSENS performance to MMI & in-program results.

Significance for Medical Education and Practice: This innovative assessment tool, if it is proven reliable and valid, has the potential to allow educators in the health professions to efficiently assess the non-cognitive qualities of the thousands of applicants to training programs, for whom reliable and valid was previously impossible.

Southern Illinois University School of Medicine, Springfield, Illinois

Principal Investigator: Dr. Richard Rosher
Grant Amount / Duration: $150,000 / 2 years
Project Title: An Objective Measure to Assess Resident Competency in Systems-Based Practice

Learn More...

Project objectives: The ACGME has directed that all residents must meet six competencies. The sixth competency, Systems-Based Practice, has presented a challenge for assessment.

The objective of this project is: to develop a standardized, objective, innovative method to measure the sixth competency: Systems Based Practice.

An OSSIE (Objective Structured System Interaction Examination) will evaluate the resident’s ability to interact with the health care team, deal with aspects of the health care system, coordinate effective care across settings, and provide cost-effective care.

Rationale and primary methods to be employed: In today’s health care system, not only must physicians be competent in their knowledge and practice of medical care, but they must be leaders of teams composed of other health care providers. They must be able to assist their patients to navigate the health care system and insure that continuity of care is promoted across health care settings. They must be cognizant of costs of various treatments.

Four required skills are identified by the ACGME that must be attained in order to be judged competent in Systems-Based Practice. The resident must:

  1. be able to understand the interaction of physician practices with the larger system, resources, and providers.
  2. have knowledge of practice and delivery systems.
  3. practice cost-effective care.
  4. advocate for patients within the health care system.

Three scenarios involving patients, families, and members of the health care team will be developed to test each of these four required skills for a total of twelve scenarios. The scenarios will be presented in the format of an OSCE. The new examination for residents, the OSSIE, will be given to PGY2 residents in the middle of their second year. It will be a formative examination that will evaluate the residents’ competence and enable tailoring of their third year to improve these competencies. Generalizability analysis will be used to determine inter-case reliability. Correlations between exam scenarios and ratings by observers in practice situations will determine the validity of the OSSIE.

How the proposed research will advance assessment in resident education: Currently, there are few measures of Systems-Based Practice to use in assessing residents. The proposed research will advance assessment in resident education by investigating an innovative, objective method of assessing the ACGME competency of Systems Based Practice. This new method of assessment, the OSSIE, will be a modification of the traditional OSCE. A simulation of interaction with other members of the health care team will capture abilities needed for physicians in today’s complex health care system. By using the OSSIE, faculty will be able to provide constructive feedback to each resident to enable improvement in Systems-Based Practice.

University of Pennsylvania, Philadelphia, PA

Principal Investigator: David Asch, MD
Grant Amount / Duration: $149,820.55 / 1 year
Project Title: Clinical Outcome-Based Assessment of Medical Education: Concept and Evaluation

Learn More...

The overall goal of this research project is to demonstrate the feasibility and examine the usefulnessof evaluating the quality of clinical training programs by assessing the clinical outcomes of the patients later cared for by the graduates of those training programs. The concept is premised on the view that although medical education serves a collection of intermediate goals, in the end the most important clinical goal is to improve the health of individuals and populations. We may mean many different things when we say that a medical school or a residency program is good, or that one medical school or residency program is better than another. However, stakeholders, including prospective trainees, health systems, and patients, could be justified in expecting at least one specific meaning: that graduates of good training programs in general take care of patients well, and that graduates of better training programs in general take care of patients better.

This project represents a study of “proof of concept,” using as our test case the analysis of maternal birth treatment and outcomes to inform the assessment of residency training in obstetrics and gynecology. We will use data from all hospital-based deliveries in New York and Florida between 1992 and 2006 to test the relationship between residency program, physician characteristics, and maternal outcomes. Our measures of performance will be:

(a) use of Caesarean section;
(b) whether a woman who delivered vaginally had a 4th degree perineal laceration;
(c) whether a woman experienced any adverse outcome, as defined by HealthGrades(2004); and
(d) a complication measure developed by Epstein, Ketcham and Nicholson (2006) that assigns larger positive values to complications that result in long hospital stays for the mother. Using these data and measures, we will address the following specific questions:

1) How much variation in inter-physician performance is explained by residency program and year of residency graduation?

2) Can residency programs be categorized reliably according to the treatment patterns and patient outcomes of their graduates?

3) Is there systematic variation in residency program effects?

We believe these notions are consistent with the goals of the Stemmler Fund and the National Board of Medical Examiners more generally, because they incorporate new methods of assessment in which the influence of training programs on clinical outcomes is assessed independently of patient and physicians characteristics. At the conclusion of this project, we expect to have a deeper and more specific understanding of the promise and limitations of evaluating medical training programs using clinical outcomes; we expect to have a series of manuscripts describing our conceptual view, analytic approach, and results; and we expect to have identified the next steps toward further development and evaluation of this assessment concept.

2005–2006 Grantees

University of California, San Francisco

Principal Investigator: Karen E. Hauer, MD
Grant Amount / Duration: $149,167 / 2 years
Project Title: Cultural Competence Using Shared Decision Making

Learn More...

Objectives: We propose to assess the reliability and validity of a shared decision making checklist as a tool for evaluating medical student cultural competence in standardized patient encounters. We will validate shared decision making checklist ratings by correlating scores with global assessments of cultural competence made by clinician experts in cultural competence. Additionally, we will compare ratings of cultural competence to ratings of general communication skills as measured by the Common Ground instrument in these standardized patient encounters to determine the extent to which cultural competence overlaps with communication skills proficiency.

Background: Failure to develop care plans that incorporate information about patients’ cultural backgrounds and values contributes to important disparities in health care. Medical schools and residency training programs are now required to teach and assess trainees’ cultural competence skills. However, a review of the literature indicates that measures of cultural competency suffer from significant deficits. Most studies rely on measurements of attitudes or skill selfassessments, and the minority of studies that do have skills-based outcomes assess only a subset of relevant competencies. Thus, while medical schools nationally are emphasizing the importance of patient-centered care and cultural competence, they lack the ability to measure the degree to which students are mastering these core concepts.

Methods: We will determine the reliability and validity of a shared decision making checklist as a measure of cultural competence. Using purposeful sampling to identify students of different genders, clinical skills competence, and races, we will select 200 videotaped encounters from 50 third year medical students’ interactions with four standardized patients. Trained coders will score the student-standardized patient encounters using a shared decision making checklist. Two faculty cultural competence experts who will be blinded to the study hypothesis, checklist content, and scores will perform global assessments of cultural competence based on review of the same videotaped encounters. The standardized patients’ ratings of general communication skills using the Common Ground instrument will be used to assess trainees’ communication skills. Reliability of the three ratings instruments will be calculated. We will assess validity by correlating the shared decision making scores to the global assessments of cultural competence. We will assess concurrent validity by correlating the shared decision making results with the communication skills scores. We will explore reliability by conducting several generalizability studies. This analysis will determine the number of raters and cases needed to obtain reliable cultural competence scores.

Implications for assessment: Our results will inform the assessment literature by evaluating the use of a shared decision making checklist for assessing cultural competence and determining the degree to which cultural competence correlates with communication skills. These results will facilitate evaluation of the efficacy of cultural competence curricula.

University of Missouri-Columbia School of Medicine

Principal Investigator: Kimberly G. Hoffman, PhD
Grant Amount / Duration: $150,000 / 2 years
Project Title: Use of Portfolios to Assess Medical Student Outcomes

Learn More...

The public in general and professional organizations in particular are increasingly demanding evidence of competence in medical practice and a physician’s ability to meet the demands of today’s society (IOM, 2001; 2003). Medical education has responded with a focus on educational outcomes (Whitcomb, 2004), case-based, authentic, curricula (Friedman, 2001; Kincade, 2005) and experiences that support the development of physicians within a complex health care system (ACGME, 2005; AAMC Report V, 2001). The emerging definition of professional competence is difficult to evaluate using traditional assessment. The portfolio addresses the current limitations of assessment by integrating professional outcomes and placing them within an authentic learning context. Challenges in portfolio assessment include insufficient inter-rater reliabilities, questions of generalizability, a substantial faculty and learner time commitment, and balancing a prescriptive, standardized approach with individualization (Friedman, et. al. 2001; Case, 1994; Des Marchais et al 1995; Challis 1999: LeMahieu, et.al 1993; Herman et al, 1995).

The University of Missouri has developed a set of key competencies for our graduates (MU2020 key characteristics) that are consistent with national and international discussions of professional competence. To our knowledge few medical schools have successfully engaged faculty in developing an approach for assessment of professional competencies. This proposed research draws on the prior work at MU to address two global questions: 1) How does the development of a set of descriptive anchors for each of the key characteristics influence the validity, reliability, reproducibility and trustworthiness of portfolio assessment? 2) How do student contributions to the portfolio influence faculty assessment of portfolios?

Descriptive anchors of exemplary performance for each of the professional outcomes will be derived from the literature, clinical faculty, medical students and patients. These anchors will be used to develop a portfolio assessment tool. A twostep judgmental review process will establish the content validity of the descriptive anchors. Inter and intra rater reproducibility will be established by using the assessment tool to evaluate the portfolios of third year medical students. Predictive validity will be determined by correlation of portfolio assessment with traditional measures of student success. The influence of student contributions to the portfolio assessment will be evaluated by determining the differences between individual faculty ratings of students portfolios rated with only required documentation and rated a second time with student contributions included. An external advisory board will provide guidance to the research team and will review the appropriateness of the intermediate research projects.

The outcome of this project will be a portfolio assessment tool to evaluate student outcomes. It will be a useful addition in the assessment of learners and promote an enhanced understanding of professional competence.

Columbia University - College of Physicians and Surgeons

Principal Investigator: Peter C. Wyer, MD
Grant Amount / Duration: $149,975 / 2 years
Project Title: Designing Cognitive Measures of Practice-Based Learning and Improvement as an Iterative Process Combining Rasch and Classical Measurement Methods

Learn More...

Currently, no psychometrically rigorous and developmentally informative instrument exists for assessing resident competencies in the cognitive domains encompassed by Practice Based Learning and Improvement (PBLI), as defined by the ACGME. Using an iterative process model, we propose to develop and empirically validate four cognitive measures tapped by a comprehensive PBLI instrument that permits periodic formative and summative assessments of residents competency levels as they progress through their programs in different specialties. Based upon preliminary experience with a relevant pilot project, we will develop an item pool addressing the following PBLI sub-competency domains: 1) analyzing practice experience, 2) using information technology to manage information and locate evidence from scientific studies related to patients’ health problems, 3) applying knowledge of study designs and statistical methods to the appraisal of clinical studies and other information on diagnostic and therapeutic effectiveness, and 4) assimilate evidence from scientific studies related to patient’s health problems. We propose that these domains, which conform to the standard cognitive domains of evidence based medicine (EBM), and are frequently summarized as ‘ask, acquire, appraise, and apply’, are best-suited for measuring the cognitive aspects of PBLI.

Initially, we will generate a pool of 100-150 Written-Structured Response items tied to sub-competency domain specifications, using different item formats (such as multiple choice, true-false items). The item pool will represent the relevant, observable facets of each of the sub-competency domains. Parallel forms of the PBLI instrument will be generated next, aligned with common assessment specifications that stipulate a weighting distribution for items tied to different competency domains and cognitive levels (Phases 1-2, Year 1). We will implement a rigorous content- and empirical- validation plan by testing each parallel form of the PBLI on samples of resident volunteers in medicine, pediatrics, and emergency medicine at New York Presbyterian Hospital, as well as upon residents in accredited programs from these specialties outside of our institution. We will use Rasch modeling techniques combined with methods from classical measurement theory to examine validity and reliability of the PBLI measures through these empirical trials. We will supplement the PBLI data with a structured survey to identify programs conforming to ‘best practices’ criteria in the target domains of EBM. Convergent validity evidence can thus be gathered and evaluated, along with evidence of resident group differences on PBLI measures in programs that are more or less compliant with EBM practices and teaching.

We believe that the resulting PBLI instrument(s) will provide a unique and critically important vehicle that, combined with existing performance-based assessment modalities, will make possible a comprehensive approach to evaluation of residents’competencies across a broad range of specialties. We believe that the PBLI instruments thus produced will fill a gap in the area of outcome assessments in residency programs, the absence of which currently limits the quality.

2004–2005 Grantees - Regular Program

Eastern Virginia Medical School

Principal Investigator: Thomas Hubbard, MD, JD, MPH
Grant Amount / Duration: $70,000 / 2 years
Project Title: The Augmented Standardized Patient: Using Augmented Reality for Assessment

Learn More...

Standardized patients (SPs) are widely used to teach and assess clinical skills. Normal SPs, however, are limited in their ability to display abnormal physical findings. Non-SP simulations could be used (e.g., listening to pre-recorded abnormal heart sounds on a computer), but that method excludes interaction with a live person, and thus is less realistic and probably a less accurate representation of students' skills in real settings. Augmented reality (AR) can expand what an SP can do. AR is a methodology that overlays artificial or virtual components (visual, aural, etc.) over the natural environment to provide the user with helpful information. The proposed augmented SPs (ASPs) will combine the assessment technologies of SPs and computer-driven simulations, allowing each to offset limitations of the other. This project augments the SP by permitting the learner to hear abnormal heart and lung sounds from an SP whose own sounds are actually normal.

We have developed a functioning prototype of the technology for this augmentation. The prototype allows the listener to hear pre-recorded heart and lung sounds when auscultating any of 26 locations on a mannequin. Prior to the award date, we will have moved the system from the mannequin to a variety of SPs of different body morphologies, with a learner hearing the selected sounds rather than those of the SP through a modified stethoscope.

The primary objective of the proposed project is to continue to make this system more realistic by minimizing the cues that AR is being used. We will improve the stethoscope's appearance and performance, make the sounds audible over a wider variety of locations on the ASP, and create a database of abnormal heart and lung sounds. These changes will move this new assessment technology from a laboratory prototype to a functional system for routine assessment of students' auscultation skills in any SP-based examination.

We will test the improved system with students in a required annual M4 OSCE. Product development needs will drive formative evaluation studies of students using the system in OSCE-like assessments throughout the project period. Through surveys and interviews we will gather students' views on certain aspects of the ASP. We will also examine the validity of using the ASP through analyses of students' performance in several situations (e.g., ASP with normal findings versus traditional SP with normal findings; ASP with normal findings versus "placebo" ASP which provides the SPs own sounds through a system similar in appearance to that of the ASP; and diagnosis of pathologies indicated by abnormal ASP findings.)

This innovative approach to assessment using augmented standardized patients to assess heart/lung auscultation skills will expand the range of physical abnormalities that can be tested in SP-based assessments.

University of British Columbia

Principal Investigator: Rose Hatala, MD, MSc
Grant Amount / Duration: $34,880 / 1 year
Project Title: Integrating Simulation Technology into a National Specialty Examination in Internal Medicine

Learn More...

As part of the assessment of clinical performance during the Canadian national specialty examination in internal medicine, candidates' physical examination skills are tested in a series of bedside stations. At each station, a candidate performs a focused physical examination on a standardized patient. Since 2003, we have integrated simulation technology into the physical examination stations in order to test candidates' ability to recognize common internal medicine physical abnormalities.


  1. To establish the relationship between competence in physical examination as assessed using simulation technology compared to real patients.
  2. To assess whether physical exam technique is a separate competency from the recognition of abnormalities on physical examination.

Internists' physical examination skills and diagnostic accuracy on real patients and simulations will be assessed during a 10 station OSCE. The OSCE will consist of 5 stations using patients with real cardiac abnormalities and 5 stations using standardized patients lacking physical abnormalities combined with audio-video simulation of cardiac auscultatory abnormalities.

Contribution to Assessment
Our integration of simulation technology into a high-stakes assessment of clinical performance is a novel contribution to the field of assessment. In addition, we will examine the relationship between the transfer of physical examination skills between simulations and real clinical performance, which has not been previously established. Our approach to integrating simulation technology into an examinee's patient assessment may be generalized to other testing formats and settings.

University of Illinois at Chicago College of Medicine

Principal Investigator: Rachel Yudkowsky, MD, MHPE
Grant Amount / Duration: $69,290 / 1.5 years
Project Title: Validation of a Hypothesis-Driven Physical Exam Assessment Procedure

Learn More...

In contrast to current checklist-based SP assessment procedures, that focus primarily on assessing physical exam maneuvers or history taking, the proposed hypothesis-driven assessment procedure brings together all key elements of physical diagnosis, namely generating a limited set of diagnostic hypotheses, anticipating discriminating findings, performing maneuvers and appreciating the findings, and interpreting the finding by proposing a working diagnosis. The assessment task requires students to think in action, while gathering the data. The findings from the scientific literature that were used to build this assessment procedure, namely co-selection, prototypes, discriminating features, and transfer, provide a strong conceptual framework for the proposed procedure. By implementing this approach as an assessment procedure, it also automatically guides learning (knowing that students learn what they are assessed on). It promotes contextualized, integrated, and meaningful learning, and provides, as advocated by medical educators, a more parsimonious, selective approach to physical diagnosis, focusing on key, discriminating findings as well as an array of structural patterns (diagnostic sets) that can facilitate transfer when students go from pre-clinical to clinical settings and from patient to patient. The procedure is based on 18 complaints, 145 physical exam maneuvers, and 59 diagnostic alternatives, a sound foundation upon which students can build their physical diagnosis. The student and class profiles generated from this procedure provide a well-organized and detailed framework for providing feedback to students and educators, where various sources of strengths and weaknesses in physical diagnosis can be parceled out, such as distinguishing anticipation errors from execution or interpretation errors (an important asset in an era of reducing medical errors). An example of a student profile following a case would include: "Good anticipation of clinical findings, some faulty physical exam maneuvers, and incorrect diagnosis." Finally, the assessment procedure and the various scores derived from the observations, such as anticipation scores, diagnostic interpretation scores, and overall physical exam scores (8 profiles), offer the possibility of better distinguishing among levels of expertise. The purpose of this proposed project is to begin to validate this hypothesis-driven assessment procedure for physical diagnosis of medical students and residents. Both a three-step and a four-step procedure will be studied, where the four-step procedure includes generating hypotheses while the three-step procedure does not. Six pilot testing and validation studies are proposed, each testing various aspects of construct validity and reliability:

C-I Pilot testing the materials and 3-step procedure with M3 students
C-II Content validation with a blue ribbon panel of clinical educators
C-III Estimating reliability, feasibility, and consequential validity of the 3-step procedure with M3 students
C-IV Estimating reliability and learning effects with early M4 students
C-V Pilot testing 4-step procedure and estimating reliability with PGY-1 & -2 residents
C-VI Estimating expert-novice differences.

Reliability will be assessed with G and D generalizability studies; feasibility using time on task and reliability data; consequential validity using a questionnaire; and instructional feedback from the assessment profiles generated using observational data. (A group of Japanese educators are testing the three-step procedure with pre-clinical students.) The main strengths of the proposed hypothesis-driven assessment procedure are its sound theoretical foundation, its relative procedural simplicity (including 3 and 4 steps), and its potential for informative and structured feedback to students and educators, and in distinguishing levels of expertise.

2004–2005 Grantees - Invitational Program

Jefferson Medical College of Thomas Jefferson University

Principal Investigator: Mohammadreza Hojat, PhD
Grant Amount / Duration: $99,957 / 2 years
Project Title: General and Specific Subscales of the Jefferson Scale of Physician Lifelong Learning: Predictors and Outcomes

Learn More...

Lifelong learning is an essential element of professionalism. In response to the demand for an operational measure of lifelong learning, we developed the Jefferson Scale of Physician Lifelong Learning (JSPLL, 19 Likert-type items). By surveying 444 physicians from the Greater Philadelphia region we provided evidence in support of the psychometric properties of the JSPLL in a previous research study supported by the NBME Stemmler Fund. Nonetheless, the following three questions remain to be addressed by using a nationwide sample of physicians:

I. Is it feasible to generate a general (G) and a specific (S) component (subscale) of the JSPLL, each applicable to a different group of physicians? The two G and S components of the JSPLL will be identified based on the results of factor analysis and content analysis, so that the G component will be more applicable to physicians in patient care who are not involved in teaching and research activities (Group 1), whereas the S component will be applicable to academic physicians who are involved in teaching or research in addition to clinical responsibilities (Group 2). The feasibility of generating the G and S components will be addressed by comparing their psychometric properties and differential validity for physicians in Group 1 and Group 2.

II. What are the predictors of physician lifelong learning? We will examine the contribution of the following measures in predicting the JSPLL (components and total) scores: Academic performance prior to medical school (MCAT, undergraduate GPAs), during medical school (performance in the basic and clinical sciences, rating of clinical competence in core clerkships), scores of the medical licensing examinations (Steps 1, 2, and 3, of the USMLE, formerly Parts I, II and III of the NBME), and ratings of postgraduate clinical competence in three areas of "data gathering," "interpersonal skills," and "socioeconomic aspects of patient care."

III. What are the professional outcomes of physician lifelong learning? We will examine the associations between the JSPLL (components and total) scores and professional outcomes such as board certification, employment status, satisfaction with career, work setting, patient load, teaching, research, publications, and other practice variables. A survey will be mailed to a nationwide sample of 5,412 physicians who graduated from Jefferson Medical College between 1975 and 2000. Multivariate statistical analyses (MANOVA and regression) will be employed. The study will lead to a better understanding of the predictors and professional outcomes of lifelong learning, and a refined assessment instrument useful for the evaluation of lifelong learning among different groups of physicians.

2003–2004 Grantees - Call for Proposals

Duke University Medical Center

Principal Investigator: Melanie C. Wright, PhD
Grant Amount / Duration: $69,718.00 / 2 years
Project Title: Assessment and Prediction of Teamwork

Learn More...

Problems with communication and team coordination are frequently linked to adverse events in medicine. Researchers in the health care industry are increasingly aware of the importance of teamwork skills and advocate a wide variety of training programs related to team coordination. These efforts are prevalent in dynamic environments such as the emergency department and operating room and tend to be focused toward specialty and continuing education. While the assessment of medical students has covered areas such as interpersonal and communication skills, these assessment measures generally focus on the student's interaction with the patient and do not assess team skills in relation to working with other health care providers. Efforts to understand and assess team performance in other work environments have resulted in the identification of specific skills that are important to good teamwork and methods for assessing these skills.

We propose to evaluate assessment tools used in other team performance contexts for the measurement of medical student teamwork skills within a small group cooperative learning environment and in a simulated patient care environment. Specifically, we hope to answer the following questions: (1) Will assessment tools used in other team performance contexts adequately assess individual medical student team skills? (2) Can these skills be assessed in naturally occurring team learning environments? (3) Do the results of the teamwork skills assessments reflect actual team performance or outcome in scenarios using a human patient simulator?

Assessment measures to be evaluated include self rating of team skills, peer rating of team skills, observer ratings of team skills, and analysis of communication content. We will first refine these assessment methods for application in the medical education environment. We will design and conduct a seminar covering team coordination principles for first year medical students. Approximately 30 medical students will be video and audio taped over four small group problem based learning sessions. Raters will use a tool designed to count specific types of communications and behaviors to code and then rate each student's performance. The same students will also be assessed in two patient care scenarios using a human patient simulator. The scenarios will be designed to require the coordination of three students with defined roles. We will compare results of measures in the small group and simulated team exercise to determine degree of relationship. In addition, we will investigate the relationship between individual team skill assessment measures and objective measures of team performance in the simulated scenario to determine whether the skill assessment measures are predictive of actual care performance.

We suggest that early training in team coordination would allow for more time and experience in practicing these skills and may help influence more positive habits and attitudes toward working in a team environment. If such training is incorporated in medical schools, practical assessment methods will be required to determine the efficacy of the training. This project provides an initial assessment of several measures to determine both convergent and predictive validity for the assessment of team skills in medical students.

McMaster University

Principal Investigator: Kevin W. Eva, PhD
Grant Amount / Duration: $67,720.00 / 2 years
Project Title: Development and testing of an innovative admissions OSCE (The Multiple Mini-Interview) for assessing non-cognitive key competencies in medical school candidates

Learn More...

While the medical profession continues to value non-cognitive variables such as interpersonal skills and professionalism, it is not clear that current evaluation tools, particularly those used during admissions protocols, are capable of reliably assessing ability in these domains. Hypothesizing that many of the problems with tools like the personal interview might be explained, at least in part, by context specificity afflicting the accuracy of assessments of non-cognitive abilities, we have developed a multiple sample approach to the measurement of these competencies and propose further study of this innovation.

The Multiple Mini-Interview (MMI) consists of short OSCE-style stations in which examinees are presented with scenarios that require them to discuss a health related issue (e.g., the use of placebos) with an interviewer, interact with a standardized confederate while an examiner observes the interpersonal skills displayed, or engage in a problem-solving exercise with another examinee. The tool has proven reliable on three separate administrations in which both graduate students and applicants to the undergraduate medical program at McMaster University participated.


  1. To determine the predictive validity of the MMI
  2. To examine the impact of rater training and background on ratings assigned during the MMI
  3. To assess the potential outcome of a security breach after implementation of the MMI

The class of 2005 at McMaster University includes 48 students who participated in the first pilot study of the MMI prior to entry into medical school. To satisfy the first objective we propose to collect data pertaining to their performance within the medical program (prior to graduation these students will sit 8 multiple choice question examinations of medical knowledge, 4 clinical reasoning exercises, 3 OSCEs, and a series of tutorial/clinical skills evaluations) and licensure (a month subsequent to graduation these students will write Part I of the Medical Council of Canada's Licensing examination (LMCC Part I). Regression analyses will be performed. In addition, we propose to mount a mock MMI for current medical residents to allow for a comparison between (a) scores assigned to medical school applicants and more senior individuals nearing completion of their training, and (b) scores assigned within the MMI to those received on Part II of the LMCC - Part I is intended to assess primarily medical knowledge while Part II consists of an OSCE that allows for a greater opportunity to assess non-cognitive characteristics of medical trainees. Finally, we propose to utilize a pair of experimental designs to determine the impact of both interviewer training and test security breaches on the psychometric properties of the MMI and the resulting assessments.

Significance for Medical Education and Practice
This innovative assessment tool, if it proves valid, is expected to improve the ability of medical programs and licensing bodies to assess the non-cognitive characteristics of both new applicants and medical professionals. In addition, we anticipate using this line of research to highlight the importance of multiple sampling approaches to assessment for overcoming the limitations context specificity places on evaluation exercises in general.

Columbia University - College of Physicians and Surgeons

Principal Investigator: Mark J. Graham, PhD
Grant Amount / Duration: $70,000.00 / 2 years
Project Title: Systems-based Practice: Development of a Measure to Assess Competency

Learn More...

This study proposes to use rigorous methodology to achieve the following: 1) to develop a well elaborated taxonomy of the specific knowledge, skills, attitudes, practices, behaviors and measurable outcomes associated with the ACGME competency - Systems-based Practice (Year 1); and 2) to develop and pilot test a global rating scale for assessing Systems-based practice based on the taxonomy. This will address one of the major challenges in assessment facing residency training programs throughout the country. During the first year we will use a well researched methodology - Nominal Group Process (Hall, 1983; Delbecq, et., al., 1975; Van de Ven, et., al., 1972) to arrive at consensus opinion that characterizes all of the key elements and outcomes associated with Systems-based Practice. Specifically we will obtain three key perspectives by running separate nominal groups with physicians, members of the healthcare team (e.g., Nurses, translators, social workers etc), and key administrators. Results of nominal groups will be validated by running 2-4 separate groups until adequate consensus is reached. This will result in a comprehensive taxonomy of the sub-competencies comprising Systems-Based Practice that can guide development of curriculum and valid assessments.

From these aggregated responses (blueprint) a global rating scale will be developed to measure Systems-based practice competency across different medical domains. Scale items will be based on the taxonomy and developed by a team of experts. The instrument will be piloted in two major residency programs in the New York Presbyterian Hospital system. Reliability and inter-rater agreement will be assessed. The outcome of this study will be a multidimensional global rating scale for Systems-based Practice that can reliably measure the key aspects of this competency. The scale should be a cost-effective and feasible template, with items designed to span across medical domains, in order to evaluate resident's core capabilities.

University of Michigan

Principal Investigator: Larry Gruppen, PhD
Grant Amount / Duration: $69,765.00 / 2 years
Project Title: Assessing clinical teaching with standardized students: a feasibility and validity study

Learn More...

Because excellent clinical teaching is important to the development of knowledge and skills in learners, the evaluation of clinical teaching is a central assessment activity at most medical schools. It has also become a component of many promotion and tenure decisions. Although teaching evaluations are necessary for these activities, there is considerable dissatisfaction with the utility of traditional student ratings of teaching. While innovative evaluation methods have been developed, few have gained wide use and many fail to incorporate the learner's perspective on teaching performance.

In an effort to augment the evaluation tools available for assessing clinical teaching, we pro­pose to extend the Standardized Student methodology from its typical educational application to an evaluation application. Derived from the widely utilized Standardized Patient methodology, Standardized Students (SSs) are medical students trained to portray teaching problems for faculty. In educational applications, these problems are used to stimulate faculty development in teaching skills. We will train 30 SSs not only to portray teaching problems, but also to critically evaluate teaching performance in order to transform them into a pool of trained evaluators of clinical teaching.

The psychometric characteristics of Standardized Students as evaluators of clinical teaching will be examined in three studies. The first will assess the inter-rater reliability of SSs as they review and evaluate the teaching performance of videotaped faculty-student interactions. The results of this study will enable us to refine the technology and clarify the sources of variance in SS evaluations.

The second study examines the validity of SS evaluations of clinical teaching in the context of a proven faculty development intervention. Using a pre-post intervention design, we will use a set of six SSs for each faculty member to measure teaching performance before and after the intervention. Validity of the SS technology will be demonstrated by its ability to measure changes in teaching performance resulting from this intervention.

The third study also examines validity, but this time in the context of routine clinical teaching. The 30 SSs and their classmates will take their third-year, required clinical rotations and provide teaching evaluations on a specified set of target faculty who have the greatest responsibilities for teaching in these clerkships. The SSs will evaluate teaching according to the dimensions and criteria for which they have been trained, while their classmates will use the traditional student ratings of global teaching skills. These two sources of data will be compared to identify common dimensions of assessment and novel dimensions or characteristics measured by the SSs.

Through these studies, we will obtain both valuable experience in the logistics of using SSs as evaluators and critical information on the psychometric properties of these evaluations Standardized Students may hold promise for more specific and useful information on the quality of clinical teaching in medical schools.

2003–2004 Grantees - Invitational Grants

University of Missouri-Kansas City School of Medicine

Principal Investigator: Louise Arnold, PhD
Grant Amount / Duration: $70,264.00 / 1 year
Project Title: Towards Assessing Professional Behaviors of Medical Students Through Peer Observation: A Multi-institutional Study

Learn More...

The assessment of professional behavior in medical students is one of the most challenging tasks facing medical educators today. Among the array of assessment methods being investigated, peer evaluation appears to be one of the most promising. Unfortunately, the social climate surrounding peer evaluation may affect its acceptability in the eyes of students and thereby depress the reliability and validity of their assessments. In addition, the typical approach to conceptualizing professional and unprofessional behavior as an expression of stable characteristics of learners -- such as honest or dishonest -- may heighten the reluctance of medical students to report unprofessional behavior. Thus, for medical students to find peer assessment acceptable, the concept and indicators of professionalism on which the assessment rests must be grounded in the peers' ideas about professionalism, the value conflicts they experience, and the situations in which they live as medical students.

Project Objective(s)
The primary objective of this project is to determine the context in which peer assessment can occur across medical schools and year levels. Specifically, do students see similar kinds of professional actions in their peers, do they take similar actions in response to these observations, and would they agree to participate in peer assessments of the same kind across schools and year levels? If not, what are the characteristics of schools or systems in which peer assessment is possible?

Using the form adapted from our initial grant, we will survey all students at eight medical schools across a range of geographic and institutional characteristics. In addition to using items from our initial survey, we will add items from a survey of institutional professionalism climate to identify characteristics of schools that might promote or prevent effective peer assessment. Surveys will be administered to each class at each school by electronic or paper means. Data will be collated and analyzed at a central location. The data will be analyzed first by using descriptive statistics and then inferential statistics to detect potential differences between institutions and among students from each year level. Factor analysis will be used to determine how various aspects of peer assessment systems across schools might be related.

Contribution to Assessment
From our preliminary work, we understand that students are willing to engage in peer assessments of professionalism provided the appropriate institutional support, anonymity, faculty oversight, timely evaluation, appropriate counseling or commendation of peers, and protection for the student evaluator are present. An important next step will be to explore the extent to which the results based on students' responses in two schools generalize to students in other institutions and thereby to deepen our understanding of the interactions between school climate and the use of peer assessments.

Wake Forest University School of Medicine

Principal Investigator: George Nowacek, PhD
Grant Amount / Duration: $99,550.00 / 2 years
Project Title: Expanding a model and assessment of professionalism in medical students.

Learn More...

Professionalism in medicine continues to be threatened by changes in the organizational and financial structure of medical practice brought about by managed care, outcomes-based medicine, and pressure for unionization. In the past several years, there have been three new definitions of medical professionalism that reflect this continuing concern: The ABIM Foundation, ACP-ASIM Foundation and European Federation of Internal Medicine: Physician Charter; the NBME Center for Innovation: Behaviors of Professionalism; and the ACGME General Competency: Professionalism. While these definitions have not changed the boundaries of medical professionalism by incorporating new attributes or expectations, they represent considerable professional time devoted to their preparation and, thereby, their level of concern.

In medical education, there also continues to be great interest in professionalism in medical students, particularly in the assessment issues. However, not much progress has been made in teaching and evaluating professionalism in medical schools as documented by a repeat survey in 2002 of medical schools when compared to results from a 1997 survey. A comprehensive review of the state of professionalism assessment also underscored the need for continued efforts to expand and validate existing measures.

The Wake Forest University School of Medicine received a 2-year NBME Stemmler Medical Education fund award in June 2001. The goals for this award were to develop a model and the measures to assess professionalism in medical students. The model provides a structure for behavioral assessment of professionalism. The definition of professionalism by Swick, detailing a taxonomy of behaviors of professionalism appropriate for medical students, was used as the basis for developing the behavioral measures for the model. The construct of professionalism development has been the foundation of the assessment model and was refined during the initial project. Professionalism development is conceptualized to be a single dimension scaled from very low to very high and that students can be placed at any point on this scale. The metric is calculated by combining performance values across the behavioral assessments and over the multiple attributes of professionalism.

The progress of our initial project included modifications of the model that revised four latent constructs of knowledge of professionalism, attitudes toward professionalism, observations of professionalism in the preclinical curriculum, and professionalism behaviors in the clinical setting. The report also detailed the development and pilot-testing of 13 measures for the latent constructs using the methodologies of knowledge testing, attitude assessment, faculty observations, peer assessment and standardized patient ratings. Two collaborating medical schools participated in the development and pilot testing of the measures.

The goals for the present study are, first, to complete the validation of the model with structural equation modeling. The analysis of longitudinal data from two student cohorts will be completed and the transportability of the measures will be established by participation of two collaborating medical schools. The second goal is to establish the validity of those few measures that provide reliable assessments of students' professionalism development that are grounded in student behaviors. A supplemental study will investigate the possibility of assessing an understanding and commitment to medical professionalism in the admissions interview.

2002–2003 Grantees - Call for Proposals

Jefferson Medical College of Thomas Jefferson University

Principal Investigator: Mohammadreza Hojat, PhD
Grant Amount/ Duration: $34,865.00 / 1 year
Project Title: An Operational Tool for Assessing Physician's Lifelong Learning

Learn More...

Lifelong learning is required in medicine to stay abreast of scientific advances and rapid developments in the medical sciences and biomedical technology. Despite the importance of physicians' lifelong learning, no psychometrically sound instrument has been developed to assess it. The purpose of this project is to develop an operational tool for assessing physicians' lifelong learning habits, activities and professional outcomes. In particular, we plan to address several psychometric aspects of a lifelong learning scale such as face and content validities, construct validity (underlying components of the lifelong learning scale), criterion-related validity (convergent and discriminant validities), internal consistency aspect of reliability (Cronbach's coefficient alpha), stability of the scores over-time (test-retest reliability), and relationship with outcomes associated with scores of the lifelong learning scale.

Based on an extensive review of relevant literature and based on the results of three pilot studies, we have developed a 19-item scale (Jefferson Scale of Physician Lifelong Learning) intended to measure physician's lifelong learning. In the present project, we plan to expand our previous studies by further investigation of the psychometric properties and measurement characteristics of this scale administered to a large number of physicians (n > 400). Once the psychometrics of this scale are established, a percentile score distribution table will also be provided for comparative purposes. This research tool can be used to assess physicians' lifelong learning habits, activities and outcomes; to assess group differences among physicians (e.g., by demographic characteristics, practice specialties, type of degrees [MD compared with MD-PhD, etc.] on underlying factor scores or the total scores of the lifelong learning scale. The scale will allow us to assess outcomes of different educational programs on physicians' lifelong learning (e.g., problem-based learning versus conventional medical school curriculum). The scale can also serve to measure an important aspect of "professionalism" in medicine defined as lifelong learning.

At the completion of this project, we will have developed a multidimensional scale of lifelong learning supported by extensive psychometric evidence for the validity and reliability of the scale. This is consistent with the stated goal of the Stemmler Medical Education Research Fund of the National Board of Medical Examiners, by developing an assessment tool that will serve researchers in the evaluation of those preparing to or continuing to practice medicine, as well as in the assessment of medical school curriculum and residency training programs designed to improve lifelong learning skills and habits among medical students and residents.

University of Toronto

Principal Investigator: Shiphra Ginsburg, MD, MEd, FRCPC
Grant Amount/ Duration: $69,805.74 / 2 years
Project Title: Translating Theory into Practice: Towards an Authentic Assessment of Professional Behavior and Reasoning

Learn More...


Teaching and evaluating professionalism has become a major focus in health professional education. Previous attempts at evaluation have failed to consider professionalism as a set of behaviors in context, and there is often insufficient exploration of the reasons why students enact certain behaviors over others. We have conducted and published a series of qualitative studies to address these issues, and to build a theory to explain students' perceptions of, and reasoning strategies in response to professional dilemmas. Our most recent work used standardized, video-taped scenarios of professional dilemmas in order to assess students' reasoning strategies in "real time", and found that students were frequently motivated to act by reference to Principles or Implications, some of which (e.g., implications for self) are disavowed in the formal curriculum. The study proposed here will answer several key questions relevant to translating this theoretical framework into an innovative method of assessment.

(1)To create authentic, text-based scenarios that describe professional dilemmas from the student point of view; (2) To determine if students' reasoning strategies can be accurately revealed in an examination (vs. a research) context, and to assess for any effects of scenario format (video vs. text); and (3) To determine what attending staff physicians (ASP's) perceive as professional/ unprofessional (or pass/fail) responses from students in a written exam setting, specifically focusing on what factors they weigh in assigning their grades.

This study uses a combination of qualitative and quantitative methods to address each of the above objectives. Step one will create the text-based scenarios from the videotapes. In step two, 60 medical students will be recruited and randomized into two groups: one group will view the videos, one will receive text. Each student will answer, in writing, a series of questions related to the scenarios (e.g., Describe in detail what the student should do next, why should the student do that, etc.) Responses will be analyzed qualitatively and quantitatively and will be compared with responses obtained in a previous study (a "non-exam" setting). This will determine whether a written exam (as compared to a research) setting can allow insight into key, authentic aspects of student reasoning, and whether text-based scenarios provoke different responses than videos. In step three, 20 Asp's will be asked to grade students' responses from step two, and will each be interviewed regarding the factors they weighed in assigning their scores. Results will be analyzed qualitatively and quantitatively to determine the relative importance placed on the actions proposed by students, the reasoning strategies they described, or other factors.

Implications for Assessment
At the completion of this study, we will have determined whether an innovative written exam setting using standardized scenarios can allow insight into key, authentic aspects of students' reasoning. If responses appear crafted rather than authentic, we may be missing the most dominant influences on students' reasoning, and therefore will be unable to provide appropriate feedback. It is also anticipated that we will have enough information from Asp's grading decisions to develop a scoring template for future use. These preliminary data will also serve as a basis for designing future studies to address issues of reliability, validity, and feasibility, and the educational value of such an exam.

The University of Iowa

Principal Investigator: Geb Thomas, PhD
Grant Amount/ Duration: $69, 540.00 / 2 years
Project Title: Project Evaluating the Breast Examination Simulator as a Tool for Clinical Breast Examination Skill Assessment

Learn More...

Clinical Breast Exams (CBEs) are an important tool for breast cancer screening. However, most health care professionals lack confidence in their clinical breast exam skills and many report that their training in this technique is inadequate. This project will refine an existing prototype dynamic silicon breast examination simulator, and test its effectiveness in assessing clinical breast examination skill with a group of clinical breast examination specialists in the Ontario Breast Examination Program.

The project will evaluate two hypotheses. The first is that performance with the Dynamic Breast Examination Simulator correlates with performance on clinical breast exams. The second hypothesis is that retesting with the Dynamic Breast Examination Simulator accurately measures performance improvement over time. Five project objectives will be achieved to test these hypotheses. 1) Refine the existing dynamic breast model. 2) Test half of the expert and novice clinical breast examiners with the dynamic breast model. 3) Correlate clinical data regarding the skill level of the breast examiners with their performance on the dynamic breast model. 4) validate the predictive results of the assessment by testing the second half of the expert and novice clinical breast examiner group. 5) Retest the novice examiners from the first year's protocol and confirm that the retest accurately measures performance improvement over time.

The project will advance assessment in medical education by providing both a useful tool for training and assessing the dexterity and pattern recognition skills required for effective clinical breast exams. The device will also introduce a novel electromechanical design that may be adapted to other physical examination palpation skills.

University of Massachusetts Medical School

Principal Investigator: Michele P. Pugnaire, MD
Grant Amount/ Duration: $70,000.00 / 2 year
Project Title: Using Standardized Patients to Assess Professionalism: A Comparative Analysis of Two Approaches

Learn More...

Clinical Breast Exams (CBEs) are an important tool for breast cancer screening. However, most health care professionals lack confidence in their clinical breast exam skills and many report that their training in this technique is inadequate. This project will refine an existing prototype dynamic silicon breast examination simulator, and test its effectiveness in assessing clinical breast examination skill with a group of clinical breast examination specialists in the Ontario Breast Examination Program.

The project will evaluate two hypotheses. The first is that performance with the Dynamic Breast Examination Simulator correlates with performance on clinical breast exams. The second hypothesis is that retesting with the Dynamic Breast Examination Simulator accurately measures performance improvement over time. Five project objectives will be achieved to test these hypotheses. 1) Refine the existing dynamic breast model. 2) Test half of the expert and novice clinical breast examiners with the dynamic breast model. 3) Correlate clinical data regarding the skill level of the breast examiners with their performance on the dynamic breast model. 4) validate the predictive results of the assessment by testing the second half of the expert and novice clinical breast examiner group. 5) Retest the novice examiners from the first year's protocol and confirm that the retest accurately measures performance improvement over time.

The project will advance assessment in medical education by providing both a useful tool for training and assessing the dexterity and pattern recognition skills required for effective clinical breast exams. The device will also introduce a novel electromechanical design that may be adapted to other physical examination palpation skills.

2002–2003 Grantees - Invitational Grants

University of California San Francisco

Principal Investigator: Maxine A. Papadakis, MD
Grant Amount/ Duration: $100,000.00 / 2 years
Project Title: A Collaborative Study to Determine the Generalizability of Professionalism Deficiency during Medical School as a Predictor for Subsequent Disciplinary Action by a State Medical Board

Learn More...

UCSF received a NBME Stemmler Medical Education Research Fund award in June 2002. Our objectives in this pilot study were to determine whether there were variables in medical school performance that were predictive of subsequent disciplinary action by a state medical board. We also wished to determine what happened to medical students who displayed unprofessional behavior in medical school. Lastly, we hoped to validate our existing professionalism evaluation system by providing outcomes on those students identified as having deficiencies in professionalism under this system. To this end, we conducted a case-control study of all UCSF physician graduates disciplined by the Medical Board of California from 1990-2000 (n = 68). Control graduates (n=196) were matched by medical school graduation year and specialty choice. We concluded that problematic behavior in medical school, but not the more traditional measures of medical school performance, is associated with subsequent disciplinary action by the state medical board. This finding adds validity to the assessment of professionalism in medical school as well as to UCSF's professionalism evaluation system. We now wish to determine the generalizability of our findings.

The hypothesis of this study is that physicians disciplined by a state medical board demonstrated unprofessional behavior while in medical school.

The objectives of this study are:

  1. To determine whether unprofessional behavior in medical school predicts disciplinary action by a state medical board in a national sample.
  2. To test model fit of the model derived from the UCSF pilot study in two other institutions.

We propose to perform a case-control study (n = 500) similar in design to the UCSF pilot study in graduates of the University of Michigan Medical School and the Jefferson Medical College. An exceptional group of collaborators, Drs. David Stern, Susan Rattner and her colleagues who work with the Jefferson Longitudinal Tracking System have committed to this research effort. Data will be abstracted at the participating institutions as well as at UCSF, where the data analyses will occur.

The results of this generalizability study will be to add validity to the assessment of professionalism and to the practice that attainment of professionalism must occur for a student to graduate from medical school. Medical school promotions committees will have outcome data on which to make decisions related to student promotion and unprofessional behavior. Lastly, our results will provide data to medical school admissions committees about specific personal and professional characteristics that must be balanced against traditional markers of achievement in applicants.

2001–2002 Grantees - Call for Proposals

University of California, San Francisco

Principal Investigator: Maxine A. Papadakis
Grant Amount/ Duration: $69,889.00/ 1 year
Project Title: Case-Control Study of Professionalism Problems in Medical School as a Risk Factor for Physician Discipline

Learn More...

Our previous work has focussed on the evaluation of professionalism deficiencies in medical students. We have developed criteria for the evaluation of professionalism and have developed an approach to this domain. We now hypothesize that lack of professionalism in medical school is associated with related deficiencies in medical practice. To our knowledge, there are no studies that examine performance characteristics in medical students that predict discipline by medical boards, or, how physicians disciplined by a state medical board performed while in medical school.

We have developed an exciting working relationship with the Medical Board of California that will permit us to test our hypothesis. Specifically, we ask to study medical student performance related to professionalism, as well as grades and test scores, and we will compare those to disciplinary actions by the state medical board. If we find that problems with professionalism in medical school correlate with subsequent disciplinary action by the medical board, it would support the work of educators to concentrate remediation efforts on those students and to understand the consequences of unsuccessful remediation. Medical school promotions committees would also have outcome data on which to make decisions related to student promotion and unprofessional behavior.

The hypothesis of this study is that physicians disciplined by a state medical board
demonstrated unprofessional behavior while in medical school.

The objectives are to determine:

  1. Whether unprofessional behavior in medical school predicts disciplinary action by the Medical Board of California
  2. The variables in medical school performance predictive of disciplinary action by the Medical Board of California

We propose to perform a blinded, case-control study of all UCSF School of Medicine graduates who have been disciplined by the Medical Board of California since 1990 (n=70). Controls will be UCSF School of Medicine graduates matched to cases within one year of graduation and specialty (n=210). For objective #1, we will abstract all negative excerpts about students' professional attributes from course evaluations and the Dean's letter of application to residency. The negative excerpts (e.g. "resistant to constructive feedback", "needs reminders to fulfill ward responsibilities") will be assigned to one of five categories: 1) Good (no negative comments); 2) Trace (occasional minor negative comment); 3) Concern (problematic comments from a course); 4) Problem (problematic comments from two or more courses); and 5) Extreme ("society must be protected from this student"). For objective #2, we will examine the medical student variables of year and age at graduation, gender, undergraduate grade point average, MCAT scores, grades in required medical school courses, NBME Part 1 scores, and presence of academic probation.

The association between the predictor and outcome variable (board disciplinary action) will be analyzed by a Wilcoxon rank sum. The predictors of disciplinary action will be determined by logistic regression analyses. Results will be described as odds ratios for each independent variable. Once this study is completed, we plan to validate the predictor variables derived at UCSF at additional California medical schools.

University of Montreal

Principal Investigator: Bernard Charlin MD
Grant Amount/ Duration: $31,482.00/ 1 year
Project Title: The effect of variability of answers among criterion experts to detect expertise with a SC test

Learn More...

The Script Concordance Test (SCT) is a new tool of clinical reasoning assessment. It has a rich context in a case-based format. Items are made from the questions and actions physicians actually ask and make in clinical practice. The test probes organization of knowledge through requests to interpret data presented in the context of authentic clinical tasks. Inferences are made from examinee scores about the degree of knowledge elaboration required to successfully address problems in the assessed domain. The tool uses an aggregate scoring method that reflects the response variability experts demonstrate when they reason in clinical situations. Scores on each item are derived from the answers given by a criterion group of experts. The meaning of the variation of criterion experts' answers to items as a way to detect clinical expertise is an important research issue. No research has been conducted to formally study what amount of variability optimizes the discriminative power of items and the discriminative power of the test as a whole.

The project will be held in the family medicine domain. It will implies 4 phases (1) test construction and validation of items; (2) item selection and answer key generation; (3) production of a final test, according to item variability; (4) administration of the final test to two contrasted groups clerkship students and experienced physicians.

We will study if variation of expert's answers used to establish the aggregate scoring key influences the ability of items (and of the test) to differentiate expertise. A coefficient of expert variability will be computed for each item. The discriminative power of items will be operationalized as the effect size of the item. A scattergram of effect size (y axis) according to variability coefficients (x axis) will expose the relationship between the two variables. We expect a curvilinear relationship (inverse U shape relation) between item indexes of discrimination and of expert answer variability, with maximum discrimination for items in the middle range of expert answer variability.

The scores on the global test (sum of the 90 items) for both groups will be compared with a simple t test procedure. To further study the effect of variability of items of the discriminative power of the test, a two-way analysis of variance will be used, with group as a between-subject factor and variability of items (according to the four groups of variability) as a within-subject factor. A group effect is expected across all variability categories and an interaction effect is expected, discrimination of group will depend of variability groups: medium variability should lead to higher discrimination than extreme categories.

Expected fall out of the project are:

  • To provide evidence about the adequacy of theories that underpin the test; notably the relationship between variability of answers in professional tasks and its importance to detect expertise in assessment situations.
  • To provide guidelines for items selection in script concordance test elaboration.

University of Kentucky Research Foundation

Principal Investigator: Charles H. Griffith III, PhD
Grant Amount/ Duration: $68,844.00/ 1.5 years
Project Title: Understanding the Association of Teaching and Learning: A National Study of Moderating Variables

Learn More...


Project Objectives
The purpose of this project is to extend the breadth and depth of our previous work documenting an association between better clinical teaching and enhanced student learning. Our specific project objectives are: 1) To extend the generalizability of our findings that better clinical teaching is associated with enhanced student learning; 2) To extend the generalizability of our approach to identifying those high-quality teachers who are associated with better student outcomes; 3) To understand what contextual variables (e.g. clerkship structure, nature of teacher-student interaction, setting, etc) are most important for teaching to be associated with better student outcomes.

For the most part, the fundamental outcome of teaching has been left unstudied: that is, does the quality of teaching actually influence student learning? For our recent NBME supported project, we reported the first documentation that medical students in their internal medicine clinical clerkship who work with one of the "best" clinical teachers score significantly higher on post-clerkship examinations and on USMLE II, controlling for prior student academic achievement. The major limitation of our previous work is that it represents teaching and learning from a single site and our local clerkship structure. Can learner outcomes be linked to individual teachers across institutions? And further, is our approach to classifying the "best"
teachers generalizable to other sites? If we are to suggest applications and implications of our findings, we need to document the exportability and generalizability of our method of identifying "best" teachers. And even further, our studies have identified a measurable link between teaching and learning, but our methodology has not allowed us to identify crucial contextual and environmental factors which must be in place for teaching to prove influential. For example, will teaching prove to be associated with better learning outcomes in other clerkship structures, with other attending-student interactions?

The first phase of our study will be to extend the generalizability of our approach to identifying "best" clinical teachers across our 19 collaborating institutions. Our approach involves convening a consensus panel of residents who had formerly been students at these institutions, with these residents classifying clinical faculty at their institution as "best", "medium" or "low" in their clinical teaching ability. Our outcomes of interest will be student's score on post-clerkship NBME examinations and on USMLE II. A large data set will be assembled on approximately 2600 students and 1300 faculty from these institutions, noting student characteristics, characteristics of their clinical teachers, the teaching category of the attendings they worked with, and institution-specific contextual variables (structure of clerkships, settings, etc). Regression approaches will allow us to document the association of attending physician teaching "category" with the outcomes of their students, controlling for prior-academic performance (USMLE I Score) and other variables. In addition, we will be able to identify what contextual characteristics (e.g. clerkship structure, etc.) are associated with enhanced student's performance.

Our study will provide support for considering using learner outcomes as a measure of teaching ability. For example, learner outcomes may be important additions to teaching portfolios and promotion and tenure dossiers for clinician-educators. Our methodology could be extended into the faculty development literature, with learner outcomes a marker for the effectiveness of faculty development programs. But even moreso, our findings will establish the generalizability of our previous work, and document in a quantitative fashion the critical importance of the educational mission, that in an era of increased cost accountability that student learning would be jeopardized if the educational mission is compromised by the current changes in academic medical centers.

University of Missouri-Kansas City School of Medicine

Principal Investigator: Louise E. Arnold PhD, David T. Stern, MD, PhD
Grant Amount/ Duration: $69,979.00/ 1 year
Project Title: Towards Assessing Professional Behaviors of Medical Students through Peer Observation

Learn More...


Medical educators have made significant progress in the reliable and valid assessment of medical students' knowledge and clinical skills. However, knowledge and clinical skills alone do not a physician make. Professional behavior is the essence of physician-hood. Although there is agreement on the definition of professional values, the assessment of professional behaviors remains problematic. However, peers offer a unique perspective on fellow students' professional behaviors. They are routinely in positions to make observations that are often not accessible to faculty and resident supervisors. Preliminary studies of peer evaluation have shown their potential for reliable and valid assessment. Yet, some peers are reluctant to participate; and their reticence may compromise the reliability, validity, and utility of their assessments. On the other hand, if we can understand what behaviors peers observe in each other, how they judge these observations, and what actions they take in light of their judgments, then we can devise a system where peer observations can contribute to more reliable and valid evaluation of professionalism.

This year-long research project will lay the foundation for the incorporation of peer observations into the larger context of the assessment of professionalism. At the completion of this research, we will have designed one or more empirically-based systems for peer observation of medical students' professional behaviors. Elements of the system(s) will include: 1) examples of medical students' professional and non-professional behaviors that peers commonly observe, 2) students' perceived responsibilities for reporting the behaviors they observe, and 3) students' perspectives on the acceptable and unacceptable conditions for reporting their observations, including the uses that peer observations would serve.

Contribution to Assessment
Although there is empirical data to show that peer evaluations have some degree of reliability and validity, medical students' perspectives on peer assessment have been ignored in the development of peer evaluation instruments and the conditions under which the instruments have been administered. Without student input into the process of peer evaluation, the likelihood of increasing the reliability and validity of their assessment of medical students' professional behavior will be compromised. Thus, on the way to producing psychometrically acceptable methods and protocols for evaluating professional behavior, we pause to learn about medical students' views about peer observation and peer evaluation of professionalism.

Clerkship students at two medical schools will be invited to participate in 12 in-depth focus groups. The discussions will (and can, according to preliminary focus groups) elicit students' observations of peer professional behavior, their sense of responsibility for providing feedback, and the conditions under which they would feel such observations could be incorporated into acceptable peer feedback. Transcriptions of the focus groups will be subjected to qualitative analysis, using the principles of grounded theory. The reliability of coding will be assessed, and the opinions of student participants will be sought to verify the investigators' interpretations of the focus group material. Based on the analysis of these initial focus groups, one or more systems for observing, reporting, and using peer observations will be constructed in detail. Perceptions and opinions of these systems will be sought via a survey administered to all clerks at the two schools. Responses to the survey will be used to generate a final set of recommendations characterizing the most acceptable system(s) for peer observations of professional behaviors among students. A key next step -- ascertaining the reliability and the external validity of the system(s) -- would await future funding.

Stanford University

Principal Investigator: Parvarti Dev, PhD
Grant Amount/ Duration: $70,000.00/ 2 years
Project Title: Objective Assessment of Physical Examination Skills Using Simulators: Biocomputational Methods in Analysis of Electronic Performance Data

Learn More...

We have developed a method of instrumenting teaching-mannequins such that physical examination performance can be captured and measured during clinical simulations. In our initial evaluation of the E-Pelvis, a pelvic examination simulator, we found that meaningful measures of performance could be extracted from large volumes of electronic performance data generated during simulated clinical examinations. In addition, we have also found that capture and analysis of performance data collected from experienced clinicians may provide an understanding of the essential qualities comprising proper physical examination techniques. Evaluation of clinician and student data will facilitate development of specific, objective performance measures for evaluating health care practitioners in training.

The purpose of this study is to develop an automated method of analyzing large volumes of electronic performance data. By using pattern recognition and signal processing software we will be able to produce efficient, reliable performance assessments. As the data generated from the simulation tools we have created represent information that has never before been available during performance assessments, several steps must be taken to ensure our data analysis methods are reliable and reproducible. The use of simulators to evaluate technical skills performance has the potential to greatly advance the current subjective methods of technical skills assessments and may provide standardized means of conducting objective performance assessments.

2001–2002 Grantees - Invitational Grants

University of Illinois at Chicago College of Medicine

Principal Investigator: Alan Schwartz, PhD
Grant Amount/ Duration: $98,183.00/ 2 years
Project Title: Generalization of Assessment of Evidence-Based Medicine Skills

Learn More...

This project seeks to generalize and expand upon our previously NBME-funded project #16-9899, entitled "Assessment of Evidence-Based Medicine Skills". Now, as then, although a number of studies have shown that residents enjoy Evidence-Based Medicine (EBM) and believe their decision making skills improve as a result, few efforts to assess acquisition of EBM skills and their impact on decision making have been reported.

In our previous project, we successfully developed and validated 12 items designed to measure the ability of Pediatrics residents and clerks to appropriately incorporate new evidence into clinical decision making in hypothetical vignettes. In this project, our objectives are (1) to use our established framework to develop additional items spanning Pediatrics and Internal Medicine, and unconfounding evidence validity from strength of results presented in evidence, (2) to establish the construct validity of these items in Pediatrics and Internal Medicine residents, and (3) to determine the degree of domain specificity in responding to these items.

We hypothesize that our framework will be equally amenable to the construction of items in areas outside of Pediatrics, and specifically in Internal Medicine, and that respondents with EBM training will be sensitive both to differences in the methodological validity and to differences in the size of reported effects in clinical research evidence. We further hypothesize that the skills measured by these items are not domain-specific - that Pediatrics and Internal Medicine residents will perform equally well on items outside of and within their own specialty.

Sixteen new items will be constructed from a 2 x 2 x 2 x 2 (case domain x decision type x methodological validity x importance of results) factorial design. The case domain (Pediatrics or Internal Medicine) and decision type (diagnosis or therapy) factors define four basic cases. For each of these four basic cases, four variant items are created by manipulating the methodological validity of the evidence (powerful or weak) and the importance of the results (large effect size or small effect size).

Residents (PGY1-3) in Pediatrics (n=60) and Internal Medicine (n=99) will be stratified by year of residency and randomly assigned to one of four groups. Each group will receive a counterbalanced subset of four items; groups will be tested four times, at six month intervals, until each subject in each group has completed all 16 items over eighteen months of the project.

Primary data analysis is hypothesis-driven and focuses on the three specific project objectives. The key statistical procedure is mixed-model loglinear analysis predicting whether a subject shifts their decision or decision confidence in the direction suggested by the evidence or against the direction suggested by the evidence. Most of the project hypotheses involve tests of odds ratios. For example, construct validity is established in part by testing the hypothesis that the odds ratios associated with strength of methodological validity and size of reported effect in the evidence are each significantly greater than 1. Domain specificity is tested using a 2x2 (case domain x residency program) mixed model ANOVA on the overall number of correct decisions by each subject.

The proposed research promises to make key contributions to evaluation of the increasingly popular introduction of evidence-based medicine into graduate (as well as undergraduate) medical education by demonstrating the generalizability of a useful and unique assessment method for measuring the impact of new evidence on an examinee's clinical decision making. Such tools are critical both for assessment of individual physicians within existing EBM curricula and for the overall evaluation of new EBM curricula and training programs.

2000–2001 Grantees

George Washington University School of Medicine

Principal Investigator: Benjamin Blatt, MD
Grant Amount/ Duration: $62,907 for 15 months
Project Title: Peer Assessment on the Web: A New Way to Grade Essay Examinations

Learn More...

New curricula featuring interdisciplinary courses on doctor, patient and society are evolving in medical schools across the country. These courses, with names such as Medicine in Society and Foundations of Medicine, aim to prepare students for their future with a deep, reflective understanding of the theory and principles underlying the practice of medicine. An ideal way to assess medical students' understanding of the complex ethical, social, and communication issues in these courses is the essay examination. Essay examinations, however, demand a great deal of time to grade when faculty time, due to unprecedented pressure on academic medical centers, is at more of a premium than ever. If a better method of administering and grading the essay examination could be devised, it could be of great benefit.

In quest of a better essay examination [these investigators] have created an approach in which peers assess each other on line with a web-based peer review environment. Peer assessment and web-based informatics can each contribute significant efficiencies to essay examination assessment. The study described in this proposal is designed to evaluate this novel approach.

In addition to efficiency, peer assessment using the web offers two other theoretic advantages over traditional faculty hand grading. First, evaluating the work of others after answering the questions themselves should reinforce learners' mastery of the material. Second, the web-based interactive evaluation process should reinforce learners' computer competency. To evaluate web-based peer assessment of essay examinations the proposed study will test the following hypotheses and study questions:

  1. Peers and faculty will demonstrate close agreement when rating student essays on a multiple essay examination.
  2. Students who serve as peer assessors will demonstrate better mastery of material on a repeat examination than those who do not serve as peer assessors.
  3. The peer assessment method will prove more cost effective than the traditional method of essay evaluation.
  4. The web-based peer assessment method will be rated as effective and fair by participating students.

Because of the upsurge of practice of medicine courses around the country, web-based peer assessment of essay examinations has potential for widespread use. Demonstration of its effectiveness in this proposed study will support far-reaching implementation of this novel method of evaluation.

University of Colorado School of Medicine

Principal Investigator: John D. Carroll, MD & John C. Messenger, MD
Grant Amount/ Duration: $70,000 for 2 years
Project Title: Clinical Skills Assessment Using Medical Simulation of Invasive Hemodynamic Procedures

Learn More...

Skill development and assessment in the performance of medical procedures continues to be an area of slow progress in medical education. Unfortunately, the model of "see one, do one, teach one" still persists in many training environments. To date, it has been difficult to develop effective tools to objectively assess procedural skills and data interpretation during the performance of invasive procedures. With the advent of medical simulation, we now have a tool for procedural skills training and objective assessment.

One of the most common invasive procedures involves the use of the pulmonary artery catheter (PAC). Currently, more than 1 million PAC's are used annually in the U.S. in a variety of clinical and research settings to perform right heart catheterization (RHC). The skills of PAC insertion and data interpretation are developed under the observation of senior level physicians through repetition in a clinical setting, with patients exposed to procedural complications in training settings. Using a new "force-feel" simulator, we are now able to simulate RHC procedures in a realistic environment, complete with real-time hemodynamic monitoring, "virtual" fluoroscopy and realistic catheter manipulation. This for the first time allows for the use of medical simulation in training and evaluation of the procedural skills and knowledge base necessary for the performance and accurate interpretation of invasive cardiac procedures.

The purpose of this pilot study is to develop a testing module for an existing cardiovascular simulator for the performance of RHC to objectively assess procedural skills, knowledge and clinical decision making of medical trainees. The project objectives are (1) to develop simulations of five commonly encountered patient scenarios for RHC; (2) to calibrate the simulations using the input of cardiovascular specialists; (3) to test a scoring methodology for the objective assessment of both procedural and cognitive skills in performance, interpretation and diagnosis utilizing RHC; (4) to assess the ability of the simulations to stratify physicians across a range of training levels and experience; (5) to measure operator satisfaction and face validity of the simulations across a spectrum of physicians.

This project will allow for the development of a novel reproducible and standardized testing method for evaluation of procedural skills and clinical knowledge that does not exist in current clinical training. It combines the ability to test procedural skills without the need for patient interaction while assessing clinical decision making in "real-time." This evaluation and training tool has broad applications to medical students through practicing physicians, and may provide a benchmark for certification in procedural skills.

The University of Texas Medical Branch, Galveston

Principal Investigator: Steven A. Lieberman, MD
Grant Amount/ Duration: $69,954 for 2 years
Project Title: High Fidelity Patient Video Vignettes in Computer-Based Testing: Psychometric Properties Relative to Text-Based Vignettes

Learn More...

The advent of computer-based testing in medical licensure examinations allows the use of new techniques to evaluate skills not assessable with paper-and-pencil exams. An example that holds considerable promise is the use of video clips of patients to evaluate examinees' skills in interpreting abnormal physical exam findings. Video clips present a higher-fidelity task relative to text descriptions: verbal descriptions bypass the essential skills of observing and interpreting visible findings. While such skills are essential in virtually all medical fields, neurology is well-suited for initial investigation due to the wide variety of abnormalities of movement that occur in patients with neurologic disorders. Beyond simply interpreting the physical findings, patient video can be used to assess examinees' ability to apply neuroanatomic, pathophysiologic, and diagnostic principles in clinical scenarios that more closely mimic actual patient encounters.

This two-year project encompasses the development and comparison of the psychometric properties of computer-based items incorporating either video clips or text descriptions of patients with abnormal neurologic findings. The specific research questions to be addressed are: (1) Do test items presented with patient videos (high fidelity items) provide a more valid assessment of fourth year medical students' competence in evaluating patients with neurologic disorders than do traditional text-only items covering identical content? and (2) What is the reliability of test items presented with patient videos relative to text-only items covering identical content?

In Year One, video clips of appropriate patients will be acquired with a digital camcorder. Following editing on the computer, these clips will be reviewed by clinical neurologists who will provide information for developing analogous text-based vignettes and multiple choice items as well as judgments about the clips' clarity and realism. Their responses will be synthesized into consensus versions of text descriptions and item response options.

Two test forms will be generated, each containing 40 items covering 20 physical findings. Two types of tasks will be presented for each finding: identification of the neuroanatomic site of dysfunction producing the abnormality, and the provision of pathophysiologic or etiologic information about the underlying disorder. Half of the findings on each test form will be presented as video vignettes and half as text vignettes, with the other test form containing the alternate (text/video) version of each finding. A brief clinical introduction (e.g., the patient's age, gender, and chief complaint) will precede either the video or the text vignette.

In Year Two, the validity and reliability of video-based and text-based items will be assessed in fourth year medical students (n=150). Evidence for validity will be gathered in the following ways: (1) experts' judgment of the realism of the video clips, (2) analysis and comparison of the thought processes elicited by video and text items among individuals at different levels of training and expertise, (3) examination of the relationship of level of training with exam performance for video and text items, (4) item analyses of text and video items, and (5) comparison of performance on video and text items with other measures of knowledge and clinical performance. To accomplish the second and third approaches, the examination will be administered to two additional groups: first year medical students (n ~ 200) and internal medicine residents (n ~ 35). Generalizability theory (G theory) analyses will be conducted on the scores from fourth year medical students to determine the reliability and efficiency of video-based and text-based items.

This study is intended to gather initial evidence of the validity and reliability of using video vignettes to assess examinees' skills in interpreting and reasoning about patients with common physical findings. Documentation of validity and reliability in a highly relevant field (neurology) could subsequently be expanded to other portions of the physical exam and to other aspects of patient-physician interactions.

University of Washington School of Medicine

Principal Investigator: Barbara A. Goff, MD
Grant Amount/ Duration: $68,680 for 2 years
Project Title: Development of Objective Structured Assessments of Surgical Skills for Obstetrics and Gynecology Residents

Learn More...

Assessment of technical skills is very important for physicians, but historically, assessment of technical or surgical skills has relied on subjective faculty assessment, a technique which has poor reliability and validity. In a previous pilot project, [these investigators] were able to develop surgical skills evaluation instruments, which when tested in an unblinded fashion, were both reliable and valid.

The current proposal will o establish the feasibility, validity, and reliability of the surgical skills evaluation instrument with a larger number of residents and in a blinded fashion. In addition, [the investigators] plan to develop and evaluate an intraoperative skills assessment tool which can be used to provide immediate resident feedback.

To complete the project successfully, [they] will use [their] skills evaluation tool to test OB-GYN residents at Madigan Army Medical Center, Oregon Health Sciences University, Harvard Medical School, and Pennsylvania State College of Medicine. Examiners from the University of Washington will be blinded as to resident level and prior performance. Examiners from the host institution will participate as well. Reliability and validity of the instrument will be established, and comparisons will be made between blinded and unblinded evaluators to establish objectivity. The intraoperative skills assessment tool will also be subjected to analysis of reliability and validity. In addition, faculty and residents will be surveyed about the usefulness of this instrument in providing prompt and constructive feedback.

Wake Forest University School of Medicine

Principal Investigator: George A. Nowacek, PhD
Grant Amount/ Duration: $69,840 for 2 years
Project Title: Develop A Model and Measures of Professionalism in Medical Students

Learn More...

Assessment of technical skills is very important for physicians, but historically, assessment of technical or surgical skills has relied on subjective faculty assessment, a technique which has poor reliability and validity. In a previous pilot project, [these investigators] were able to develop surgical skills evaluation instruments, which when tested in an unblinded fashion, were both reliable and valid.

The current proposal will establish the feasibility, validity, and reliability of the surgical skills evaluation instrument with a larger number of residents and in a blinded fashion. In addition, [the investigators] plan to develop and evaluate an intraoperative skills assessment tool which can be used to provide immediate resident feedback. To complete the project successfully, [they] will use [their] skills evaluation tool to test OB-GYN residents at Madigan Army Medical Center, Oregon Health Sciences University, Harvard Medical School, and Pennsylvania State College of Medicine.

Examiners from the University of Washington will be blinded as to resident level and prior performance. Examiners from the host institution will participate as well. Reliability and validity of the instrument will be established, and comparisons will be made between blinded and unblinded evaluators to establish objectivity. The intraoperative skills assessment tool will also be subjected to analysis of reliability and validity. In addition, faculty and residents will be surveyed about the usefulness of this instrument in providing prompt and constructive feedback.

1999–2000 Grantees - Call for Proposals

University of Cincinnati College of Medicine

Principal Investigator: Michael A. Sostok, MD
Grant Amount/ Duration: $59,600 for a 2-year project
Project Title: Student Self-Assessment of Clinical Competency Using Encounter- Based Matrix Evaluation

Learn More...

Modern day medical education emphasizes learner-centered curricula. If medical students are to gain the maximum benefits from such educational strategies, students need to be able to accurately assess their own strengths and weaknesses. Further, accurate self-assessment is a crucial skill in a field requiring continuous professional development. However, a review of the literature reveals that learning the skill of valid and accurate medical student self-assessment is as elusive as it is vital. Previous studies indicate that students' self-assessments of their knowledge and skills correlate poorly with traditional external measures.

This project offers to improve students' skills in self-assessment and develop a more accurate self-evaluation method. The investigators plan to conduct a randomized study to demonstrate that an innovative self-assessment instrument, the encounter-based matrix evaluation, will provide more valid and reliable student self-assessment of competency when compared to traditional self-assessment methods.

The researchers believe that the encounter-based matrix self-evaluation will give students a more accurate and realistic view of their clinical achievements. The goals and objectives of the proposal are: (1) to develop and validate the encounter-based matrix self-evaluation tool, including (1a) to develop and employ the encounter-based matrix evaluation for their Internal Medicine Clerkship, and (1b) to validate the matrix evaluation results; (2) to improve self-assessment skills of students, including (2a) to provide students with rationale for accurate self-assessment, and (2b) to develop the students' skills of self-assessment; (3) to apply encounter-based matrix self-assessment method to other medical specialties, including (3a) to develop an encounter-based matrix evaluation for the Pediatric Clerkship, and (3b) to implement an encounter-based matrix self-evaluation instrument for the Pediatric Clerkship. That is, to develop, implement, and assess this medical student self-evaluation tool, the encounter-based matrix evaluation, which the investigators believe will assist students in their pursuit of careers enhanced by lifelong learning.

University of Rochester School of Medicine and Dentistry

Principal Investigator: Robert G. Holloway, MD
Grant Amount/ Duration: $60,000 for a 2-year project
Project Title: Early Introduction and Evaluation of Medical Student Skills to Practice Evidence-Based Medicine

Learn More...

Evidence-based medicine (EBM) is the integration of individual clinical expertise with the best available evidence from systematic research. EBM includes a relatively new set of skills to help clinicians retrieve, appraise, and apply current best evidence, and helps ensure that patients receive the correct type of care and information precisely when they need and want it. There has been burgeoning literature on the practice of EBM, but relatively little study on how to evaluate it. A unique characteristic of teaching and evaluating EBM is its multidisciplinary nature. EBM is a lifelong, self-directed learning behavior which demands learning by inquiry and the use of information technology. It calls on the expertise of the medical librarian, clinical and health service researchers, as well as health educators. The investigators propose to take advantage of a new curriculum at the University of Rochester School of Medicine and Dentistry in which the knowledge and skills of EBM are introduced to medical students early and reinforced often. With the use of a multidisciplinary committee, they will develop and refine a method to evaluate all five core steps in the EBM cycle: (1) converting information from patients into answerable and searchable questions, (2) searching for the best available evidence, (3) critically appraising that evidence for its validity and importance, (4) applying the evidence in clinical practice, and (5) self-evaluation.

The investigators propose to create EBM evaluation modules starting with an in-class exercise and ending with the completion of an out-of-class exercise. These evaluation modules will be launched with an enhanced clinical vignette. When the vignette is complete, a question-building exercise will be distributed to the students to be handed in class prior to class departure (Step 1). Students will then be given an evaluation packet and instructed to design and perform a Medline search as well as critically appraise a research article; the search and appraisal results will be handed in three days later (Steps 2, 3, and 4). Three EBM modules are planned during the first two years of the student curriculum. Standardized methods will be performed to improve the reliability of the three grading instruments and concurrent validity will be assessed by correlating scores obtained on the third EBM module with other measures of student performance. Students will also be asked to self-evaluate their practice of EBM on a continuous basis throughout their first two years as part of each course's evaluation procedures (Step 5). The research team will also test a subset of fourth-year medical students who have not undergone early and sustained EBM instruction and compare it to scores obtained for those students who have.

By the end of two years, this project team will have moved the EBM evaluation methodology forward by creating a valid and reliable method to assess a medical student's ability to practice EBM. They will also have created a testing and grading manual to assist others in the evaluation process.

SUNY at Buffalo School of Medicine and Biomedical Sciences

Principal Investigator: Andrea T. Manyon, MD
Grant Amount/ Duration: $59,980 for a 2-year project
Project Title: The I-OSCE: Integrating Assessment of Medical Students' Scientific Knowledge and Clinical Skill

Learn More...

Curricular revision integrating basic and clinical sciences has been the focus of medical school reform. This project will meet the challenge of assessing students' mastery of the reformed curriculum by developing an assessment protocol to measure how well the student has integrated clinical skill and scientific knowledge. There is little evidence of such assessment in the U.S.

The project will first develop a definition of "integration," its attributes and components, then develop and validate appropriate assessment measures. The primary vehicle will be the "I-OSCE," the Integrated Objective Structured Clinical Exam, combining standardized patient encounters with other techniques. The measure’(s) adaptability will be tested in alternate educational environments at the State University of New York at Buffalo and the Medical College at Ohio State University.

The project asks the following research questions: What is integration and how can it be measured? Is the I-OSCE a valid assessment of students' integration of knowledge and skill? Can it become a critical component of assessment of undergraduate students at SUNY at Buffalo? Is it transferable to other medical schools? Is it a reliable tool for evaluating an integrated curriculum?

The project's outcome objectives are to develop, implement, and validate measures that assess a student's integration of two or more basic sciences with developing clinical skills. That is, (1) to create an easily scored instrument to assess students' integration of knowledge and skill, (2) to assess I-OSCE for validity and reliability, and (3) to assess student and faculty satisfaction with the integrated exams. Other outcomes to explore include: correlation between student performance on the integrated exam with their scores in basic science exams, an assessment of the process as it develops, and correlation between students' scores on the I-OSCE and their preceptors' scores on the MedIQ. These objectives clearly respond to the NBME mission of advancing the practice of assessment of undergraduate medical education, particularly in this era of curriculum reform.

This study will be composed of four parts and continue over two years. The components — many of which are concurrent — include: (1) Defining Integration, (2) Developing and Implementing the I-OSCE, (3) Gathering and Analyzing Data, and (4) Evaluation. An Advisory Design Group comprising faculty and students including the director of the Problem-Based Learning Center at Ohio State, will develop the instrument and test it with first-year students at SUNY Buffalo. Researchers will follow a comprehensive validation strategy: review the instrument for face and content validity. Post-test, assess the instrument for concurrent and construct validity, and, in the second year, for predictive validity. To assess the instrument's transferability, research staff will present the I-OSCE at Ohio State to identify variables between medical school environments.

The University of Texas Health Science Center at San Antonio

Principal Investigator: John H. Littlefield, PhD
Grant Amount/ Duration: $60,000 for a 2-year project
Project Title: Improving the Quality of Resident Performance Appraisals

Learn More...

Ratings by faculty of a resident's on-the-job performance are used universally for evaluation. Typically these ratings are not very useful for providing constructive feedback to the resident or for substantiating administrative actions regarding a marginally performing resident. A major problem with performance ratings is that faculty, even when they recognize a marginal performer, are hesitant to write candid evaluations. Organizational change at the department level can best overcome faculty hesitance to write candid resident performance appraisals. This project will conduct a field-trial test of guidelines for diagnosing problems and improving a resident performance evaluation system using an organizational change framework.

Organizational change to improve a resident performance evaluation system can be achieved through a four-stage process: (1) diagnose the problem, (2) plan an intervention, (3) intervene, and (4) evaluate the impact. If the diagnosis of a significantly flawed evaluation system fails to arouse faculty concern (eg, the organizational climate is too clouded by other problems), it is unlikely that the evaluation system can be improved. Plans to change a resident evaluation system require an in-depth understanding of four components that comprise a performance evaluation system and how they function: (1) organizational context is the departmental environment surrounding a performance evaluation system, (2) performance judgment is what a rater privately thinks about a given resident's performance, (3) performance appraisal is the written statement about a resident's performance, (4) decision making entails all of the issues that affect a rater's willingness to provide feedback or the department's willingness to take administrative action in response to performance ratings. These four components function as interdependent parts of a unified whole, analogous to biologic systems (eg, endocrine glands). Like biologic systems, the four components influence each other. In a previous small-scale study, interventions in the organizational context produced measurable improvements in the performance appraisal and decision-making components.

This project expands the small-scale study by conducting a field-trial test of guidelines to improve resident performance evaluation systems. Four residency programs (two internal medicine and two surgery) at two academic medical centers will participate in the study. More than 300 residents will be involved. Each residency program will use the guidelines to diagnose its own evaluation system, plan an intervention, intervene, and evaluate the impact. Diagnosis will be based on resident performance evaluation data for two years prior to the project and will involve analysis of four variables: (1) number of completed rating forms per resident, (2) percent of forms with behavioral-specific comments, (3) generalizability of the rating scores, and (4) number of administrative actions taken by the program. Plans to intervene will be based on each department's organizational climate and will entail identification of leverage points for promoting change (eg, chronic failure to take action) as well as barriers to change (eg, how to convince skeptical faculty). Interventions will include actions such as revising and pilot-testing the rating form and training raters and providing feedback to them. Evaluating the impact of the intervention will be based on repeating the data analyses from the diagnosis stage and comparing the results to the baseline data. This project is innovative because it combines a sophisticated understanding of how performance evaluation systems function with an emphasis on how to change an organization.

If the guidelines for diagnosing problems and improving a resident performance evaluation system are validated in a field-trial test, residency directors throughout the U.S. will be better able to manage a vexing problem related to resident education and evaluation.

University of Virginia School of Medicine

Principal Investigator: William A. Knaus, MD
Grant Amount/ Duration: $60,000 for a 1-year project
Project Title: Resident Assessment Performance System (RAPS)

Learn More...


(I) To develop a user-specified and user-friendly design structure (content and technical specifications) for a new web-based tool, the Resident Assessment Performance System (RAPS). Using readily available and accurate electronic data, RAPS would directly record and evaluate a resident's patient care experience and outcome performance. It would encourage the transition within graduate medical education from experience-based to competency-based training. RAPS would also help clinicians develop independent quality reporting and accountability skills.

(II) The investigators will also test the hypothesis that a resident's clinical performance and the content of his or her residency experience will improve when the resident is provided with explicit and timely feedback about his or her own patient care practices and comparative peer information.

Because of the greater variety of efficacious medical treatments available, more emphasis is being placed on objective evidence-based content for medical decisions and more value is accorded to documenting the patient outcomes achieved from treatment. These changes are increasing the need for appropriate accountability measures that will assure the profession and the public that the medical education system is both monitoring the content of training and documenting how well it prepares graduates for future practice.

This project aims to address these needs through the design of a web-based tool that would provide residents, residency program directors, and eventually accrediting organizations with a comprehensive record of a resident's patient experience, the relative efficiency of his or her practice style, and the outcomes of his or her patients.

(I) The project will use routinely collected electronic data contained in the University of Virginia's Health System's Clinical Data Repository (CDR). The CDR is an electronic relational database that contains detailed clinical diagnoses (laboratory tests, etc.) and financial (billing, pharmacy charges, etc.) data from all University of Virginia treatment locations. Using the CDR as a foundation, the investigators will: (1) recruit a cohort of 12 Internal Medicine residents from the University of Virginia and introduce them to the objectives and intent of this project, (2) solicit from these residents specific content and format recommendations (a "needs analysis") for RAPS, (3) provide the residents with palm top devices with which to maintain a prospective record of all of their inpatient and outpatient encounters over a 5-month period, (4) develop specific content recommendations for RAPS with the above patient experience using the infrastructure and technical capabilities of the CDR, (5) develop sample report formats using the actual patient experiences of the 12 residents and the data available from the CDR, (6) present draft formats for RAPS to the residents and to a representative group of residency program directors and accrediting bodies for feedback on the usefulness of the data and the usability of format; (7) make revisions in the content and presentation of the tool.

(II) The project's research design includes two sources of data collection. The participating residents will maintain a list of names and medical record numbers of all patients for whom they have direct responsibility (both inpatient hospitalizations and outpatient clinic visits) and collect and record specific patient encounter information using hand-held computing devices. This data will be linked to routinely collected electronic clinical and administrative data contained in the University of Virginia Health System's Clinical Data Repository (CDR), a data warehouse that contains data elements from a number of disparate computer systems. Although the hospital billing and internal costing systems used on this project collect many useful data elements (demographics, diagnoses, procedures performed, laboratory and pharmacy information, resource utilization), other information needed to assess and improve clinical practice is not captured electronically (ie, time spent with patient, decision factors). The proposed RAPS tool provides the interface between the CDR and the data collected by the residents, both for collection and merging and for reporting. An important component of the research methodology to be employed is that the RAPS tool design will be guided, to a large degree, by the participating residents based on their input collected from structured interviews. The RAPS reporting features will include resident summary and detailed information related to patients seen, clinical procedures, resource utilization, and patient outcomes. Residents will be able to view their own information and compare their performance and practice to both their resident peers and to internal and external benchmarks. For example, a resident will be able to determine whether his or her laboratory test ordering behavior is consistent with comparative groups.

There are many intriguing questions that might be investigated once the specific data elements are decided and the tool designed. Assessing and reducing variation in the practice patterns of residents is a primary outcome objective. The investigators will use sound statistical methods to compare many aspects of practice patterns, costs, consumption of resources, and patient outcomes within the pilot group of 12 Internal Medicine residents to control groups and test the hypothesis that reduced practice variation will improve quality of care and cost outcome measures. Another objective of RAPS is that directors of the residency training programs will be able to determine whether residents are seeing an appropriate mix of patients. The project team will assess whether each resident's patient profile is representative of conditions and illnesses encountered in a general medicine practice and whether the resident has had sufficient exposure to important illnesses (ie, AIDS, diabetes).

The final products of this one-year effort will be: (I) A user-specified and user-friendly design structure (content and technical specifications) for a web-based tool that would provide residents with direct and continuous feedback regarding the content, practice efficiency, and patient outcomes experienced during their residency training. This product could then be used for the development of a working prototype for RAPS for use in actual residency evaluations. As such, RAPS would provide the foundation for a competency-based evaluation of a clinician's postgraduate medical education that would both improve the precision of individual ratings and, over time, the structure and design of training programs. (II) The specific outcome objective of the research question is to reduce variation in practice patterns in a pilot group of 12 Internal Medicine residents and to ensure that each resident sees a diverse population that includes an appropriate mix of important patient conditions.

1999–2000 Grantees - Invitational Grants

SUNY Buffalo School of Medicine and Biomedical Sciences

Principal Investigator: Paul A. James, MD
Grant Amount/ Duration: $99,969 for a 2-year project
Project Title: Developing Standards to Evaluate Ambulatory Medical Instruction: The MedEd IQ®

Learn More...

This project team's previous Fund-sponsored research developed a theoretical model for understanding instruction in ambulatory settings and a valid instrument to measure these processes. The MedEd IQ is unique in its development and its framework for providing timely feedback within a quality improvement model. The instrument offers the opportunity to standardize measurement of instructional quality that has previously been limited by variability in sites, courses, and teachers.

To demonstrate its generalizability and effectiveness as a measurement and quality improvement tool, the instrument will be studied in multiple medical schools and training settings across the United States. The investigators will examine the generalizability of the MedEd IQ and its properties as an instrument for evaluating instructional quality in ambulatory settings. They will determine how MedEd IQ scores compare across medical schools in different regions of the country, different courses in the same school, and different practice environments. Also, they will examine differences in the four MedEd IQ subscales (Preceptor, Site, Involvement, and Learning Opportunities) across these variables. A second objective is to develop and assess a reporting mechanism to provide feedback to course directors and preceptors about instructional performance in a quality improvement model. The research team will also further study the validity of the MedEd IQ instrument.

A purposeful sample of nine medical schools nationally will participate in this study. This project addresses a national need for a valid and reliable tool to measure instructional quality in community-based practices while providing benchmarks to develop standards and improve instruction. The MedEd IQ instrument differs from other measurement approaches. It is based on the relevant complexities of office practice and attempts to measure the broad range rather than programmatic evaluation or isolated teaching effectiveness. Through the comparisons of MedEd IQ scores across schools and practice sites, this project will enable the establishment of benchmarks for instruction, faculty development, and possibly site accreditation.

UCLA School of Medicine

Co-Principal Investigators: Ronald H. Stevens, PhD and Adrian Casillas, MD,
Grant Amount/ Duration: $99,605 for a 2-year project
Project Title: Parallel Neural Network Procedures for Identification and Assessment of Patient Management Strategies and Styles

Learn More...

The investigators continue to believe that the examinee's sequence of actions on NBME computer-based case simulations (CCS) contains useful strategic and behavioral information which, while not critical for pass/fail decisions, can enrich the model of student performance, especially when integrated with the CCS ratings. These behaviors relate to the metacognitive activities associated with problem solving (planning, self-monitoring, the use of rule chaining strategies), and may also include factors associated with the motivational nature of high stakes assessments which, when combined, reflect the student's underlying mental model of the case.

The research team's goal is to use artificial neural networks and high-performance computing, in conjunction with current NBME ratings, to derive comprehensive and possibly predictive performance models from the CCS student data while continually evaluating the completeness of these models.

The investigators will first identify groups of similar performances based on the sequence and latencies of the examinee's actions during the CCS. They will refine and enhance their method of coding sequence information to detect more subtle patterns of performance that may extend over a sequence of many decisions. To realistically accomplish this computationally intensive process they will modularize the neural network architecture and conduct parallel training with the performance data. This will be accomplished in collaboration with the Center for Advanced Computing Research at Caltech. After the research team documents the sequence features responsible for the clustering by the search-path mapping and neural network tracking software they previously developed, they will construct an integrated performance model by combining the NBME rating classifications with the sequence data derived clustering. "Committees" of ANN operating in parallel will conduct this integration with each network providing a different perspective of the data. The training and classification errors generated during training and testing will provide a measure for how complete this integration is and suggest variables to include/exclude from the model. These studies will not only provide new information and patient diagnosis/management strategies, but will suggest a model for extracting and combining clustering and classification information from other complex assessment tasks. Procedures for identification and assessment of patient management strategies and styles, as identified by this research, will be disseminated to schools at the close of the project.

1998–1999 Grantees

University of Arkansas for Medical Sciences

Principal Investigator: Patricia S. O'Sullivan, EdD
Grant Amount/ Duration: $47,550 for an 18-month project
Project Title: Demonstration of Portfolio Assessment in Residency Education

Learn More...

According to Dr. O’Sullivan’s proposal,

portfolios have an appeal in residency education as an assessment that reflects what the resident is actually doing during the educational program. The resident can provide evidence about his/her performance. However, portfolios have had no reported use in residency education. The psychometric issues concerning portfolios must be addressed before encouraging their use.

The purpose of this study is to provide a model of how to construct, score, and assess portfolios using one selected residency program. The study’s objectives are: (1) to develop the guidelines for a well-structured portfolio that can be used in any residency program; (2) to develop the assessment procedures to allow for reliable assessment of the portfolios; and (3) to evaluate the portfolio as an authentic assessment for performance in a psychiatry residency. The hypotheses of the research are that there is a significant difference in portfolio scores among first, second, third, and fourth year psychiatry residents, and that there is a significant correlation between portfolio score and average faculty clinical evaluations.

An expert consultant will aid in developing well-structured portfolios. A psychiatry faculty subcommittee will work with the consultant to develop work sample descriptions that will be used to devise the criteria of what must go into the portfolio. The subcommittee will also work with the consultant to develop the criteria that indicate how the evidence in the portfolio will be evaluated. Raters will be trained so that they can score portfolios reliably. From these scores, the reliability of the composite score obtained from the various elements in the portfolio will be determined, and the procedures recommended by the consultant will be implemented. Twenty-four residents will be instructed on how to develop a portfolio. Preliminary portfolios will be used as training materials for the raters and to provide guidance to residents and faculty. The final portfolio scores will be correlated with faculty evaluation scores, Psychiatry Residency in Training Examination (PRITE) scores, and, for first year residents, the USMLE™ Step 3. The residency program coordinator will provide these data to the researchers by ID number. Additionally, in June of 2000 a sample of residents and faculty involved in the study will be interviewed to determine the effects of assembling the portfolios.

This demonstration project will provide guidance to residency programs on how to construct valid portfolios that can be reliably scored and serve as an enhancement in residency evaluation. Other goals of portfolio development are to provide evidence of how the resident is thinking and the quality of work resulting from participating in the program. The portfolios will be evaluated by outside raters, giving the resident an objective assessment of their performance.

Dr. O’Sullivan, associate professor in the Office of Educational Research, University of Arkansas College of Medicine (a part of the University of Arkansas for Medical Sciences), was a participant in a General Internal Medicine Faculty Development Grant and Primary Care Training Grant and has served as statistician on several other grants. She received her EdD from the University of Houston, and has published on research in medical education with colleagues in several specialties, including pharmacy, emergency medicine, psychiatry, physical medicine and rehabilitation, and nursing.

James A. Clardy, MD, associate professor and residency program director, Department of Psychiatry at University of Arkansas for Medical Sciences, will help in the development of the portfolio content and criteria. He will assist with the psychiatry faculty subcommittee and work with the residents. Dr. Clardy will facilitate rater training.

Mark D. Reckase, PhD, will serve as consultant to the project. He was previously assistant vice president of Assessment Innovations Area, American College Testing Program, wherein he supervised the support, technological applications, and research departments, and the Performance Assessment Center. In 1998, Dr. Reckase joined Michigan State University as professor of education in the Measurement and Quantitative Methods Program. Aside from articles on other measurement research topics and issues, Dr. Reckase has published several articles on the development and evaluation of portfolios as a method of assessment.

Mildred Savage, PhD, assistant professor of educational evaluation for University of Arkansas for Medical Sciences’ Office of Educational Development, will also serve as a member of the research team. She will be responsible for qualitative data component for the project.

University of Illinois at Chicago College of Medicine

Principal Investigator: Alan J. Schwartz, PhD
Grant Amount/ Duration: $49,525 for an 18-month project
Project Title: "Assessment of Evidence-Based Medicine Skills"

Learn More...

According to Dr. Schwartz’s proposal,

the last 20 years have seen the emergence of a new approach to the practice of medicine: evidence-based medicine (EBM). The premise of EBM is that clinicians must learn to evaluate and apply the published literature more effectively. This will require them to learn new skills, including: formulating questions about their patients that can be answered in the medical literature, searching that literature for potentially relevant research reports, andcritically appraising the research design and analysis to determine the validity and applicability of results to their patients. Though these skills are being increasingly taught to medical students and residents through journal clubs, conferences, and evidence-based morning reports, very few efforts to assess acquisition of EBM skills and their impact on decision-making have been reported.

This project’s objectives are to develop and to validate an assessment tool for evidence-based decision making among clerks and residents in pediatrics. The proposed tool is founded on a Bayesian approach to evidence, and assesses whether, and to what degree, beliefs about a patient’s diagnosis or a management strategy are appropriately revised when new information is available. The primary research hypothesis is that the tool will be a valid measure of EBM skills; that is, that clerks and residents who receive EBM instruction should have higher post-rotation scores than pre-rotation scores, and higher post-rotation scores than clerks and residents who do not receive EBM instruction. A repeat assessment will be used to determine the reliability of the tool. Assessments of clerks and residents in other fields (eg, generalinternal medicine, family medicine) will be undertaken to evaluate the generalizability of the approach.

Dr. Schwartz is assistant professor of clinical decision making in the Department of Medical Education at the University of Illinois at Chicago College of Medicine. He received his PhD in psychology from the University of California at Berkeley. He has published widely on medical decision making, especially on strategies used in making risky decisions and "judgmental psychology," including articles currently in press. Dr. Schwartz will be assisted by two co-investigators from his institution: Jordan Hupert, MD, assistant professor of clinical pediatrics, Department of Pediatrics, Division of General and Emergency Pediatrics; and Arthur Elstein, PhD, professor, Department of Medical Education, and professor of health resources management, School of Public Health.

University of Miami School of Medicine

Principal Investigator: Jonathan J. Braunstein, MD
Grant Amount/ Duration: $49,499 for a two-year project
Project Title: "A Multidimensional Assessment Program to Evaluate Medical Students' Non-Cognitive Attributes"

Learn More...

According to Dr. Braunstein’s proposal,

society expects physicians not only to have mastered a fund of medical knowledge and a set of clinical skills, but also to possess the professional attributes (eg, compassion, honesty, respect for patients, ethical standards) necessary to provide excellent patient care. Yet, despite the importance of these non-cognitive attributes, there is a paucity of suitable outcome measures for their assessment in medical students.

In its January 1998 report, Learning Objectives for Medical Student Education: Guidelines for Medical Schools (MSOP Report), the Association of American Medical Colleges (AAMC) challenged medical school faculties to undertake efforts to develop assessment methods for these attributes. This proposal, in response to that challenge, is being submitted to develop and pilot test a multidimensional program (which will be exportable to other medical schools) for the assessment of medical students’ non-cognitive attributes.

This project’s goal is to construct a multidimensional program of outcome measures which can serve as a valid and reliable assessment of medical students’ non-cognitive attributes. To achieve this goal, the program will employ a systems approach in which faculty, students, and administrators utilize a variety of tools and methods to evaluate the non-cognitive aspects of student performance, as recommended by the AAMC in its clinical evaluation program. The non-cognitive attributes listed in the MSOP Report will serve as a basis for the assessment. These attributes will be operationally defined in behavioral terms by conducting structured interviews with full-time faculty physicians, community physicians who serve as voluntary faculty, students, patients, and medical school administrators. In the project, the new tool will be designed by formulating objectives for the non-cognitive attributes students should possess prior to graduation and developing the instruments and methods of assessment, including evaluation forms, paper cases and video "trigger" tapes, objective structured clinical examination (OSCE) stations, and a summary feedback form for the students. In the second year of the project, a pilot study of the new program will be done, and the data collected will be analyzed to determine its validity and reliability.

Dr. Braunstein, associate dean of curriculum at the University of Miami School of Medicine (UMSM), received his medical degree at that institution and has held his current post for the past 20 years. He has also been coordinator of the school’s behavioral science course, and has had extensive experience with the evaluation of both the cognitive and non-cognitive aspects of medical student performance. He has published on the medical applications of the behavioral sciences.

Dr. Braunstein will be supported by two co-investigators: Robert C. Duncan, PhD, and Ann Randolph Flipse, MD. Dr. Duncan, professor in the division of biostatistics, Department of Epidemiology and Public Health, is a specialist in biostatistics and educational design and measurement. Dr. Flipse is an adjunct associate professor of medicine and the director of the freshman clinical skills (CS) course, which includes cognitive skills and non-cognitive attributes of physicians.

University of Toronto Faculty of Medicine

Co-Principal Investigators: Jodi M. Herold, BHSc(PT), and Brian Hodges, MD, FRCPC
Grant Amount/ Duration: $24,005 for a one-year project
Project Title: "The Effect of Candidate Instructions on Reliability of Checklist and Global Rating Scores on an Objective Structured Clinical Exam"

Learn More...

The assessment of clinical competence in medical and health professional education has progressed rapidly in the past two decades. According to the proposal for this project,

one of the most dramatic and successful innovations in this area is the Objective Structured Clinical Examination (OSCE). Candidate performance in the OSCE is typically rated using two rating forms, a binary checklist form and a process-oriented or global rating form. In the vast majority of OSCE’s, checklists account for the bulk of marks awarded to candidates, with global ratings accounting for 25% or less in most cases. Recently, however, several arguments have been advanced to support an increased use of global ratings over the traditional reliance on checklists. Global ratings appear to have psychometric properties that are at least as good as, and often better than, checklists. Further, there is a growing literature that suggests that clinicians with higher levels of expertise do not solve problems in clinical settings using approaches reflected in a checklist approach. Finally, the checklists are binary ratings that tend to neglect higher components of clinical competence, such as empathy, rapport and ethics.

This project will investigate optimal weighting of checklist and global rating scores to include the effect of candidate behavior, or "test-wise"-ness. Recent studies by these investigators of the psychometric properties of OSCEs found that student behavior changed depending on the students’ perception of how they would be evaluated. Specifically, students who anticipated the use of checklists conducted highly focused interviews, asking mostly closed-ended questions. Students anticipating global ratings asked more open-ended questions and appeared to give more attention to their relationship with the patient. The purpose of this study is to carefully examine the relationship between student perceptions of the type of instrument used to evaluate their performance in an OSCE and their behavior in that OSCE. This study will evaluate whether or not candidates in an OSCE examination alter their behavior according to their perception of how they are to be scored. If so, it will explore the magnitude of change of scores on standard measures (eg, checklists, global ratings) that result from those behavioral changes. Further, this study will investigate whether this change in candidate behavior affects the psychometric properties of the test, specifically, the internal consistency or reliability of the test using the two rating forms.

Jodi Herold, currently a master of arts degree candidate in educational measurement and evaluation, was licensed by the College of Physiotherapists of Ontario in 1991. Since June 1996, she has served as lecturer for the Department of Physical Therapy and academy associate for interprofessional education with University of Toronto’s Faculty of Medicine. In 1998, she also became a research assistant there, responsible for Objective Structured Assessment of Technical Skills (OSATS) examination for evaluation of surgical skills curriculum in the postgraduate surgery program for the Center for Research in Education, Faculty of Medicine, and served as an evaluator and statistician on other special projects within the school. She has also worked as a standardized patient trainer for the Canadian Alliance of Physiotherapy Regulatory Boards.

Brian Hodges, MD, MEd, FRCPC, vice chair (education) of the Department of Psychiatry in the University of Toronto’s Faculty of Medicine, received his MD in 1989 from Queen’s University, Kingston, Ontario, Canada and an MEd in health professional education from the Ontario Institute for Studies in Education in 1995. He is the recipient of several awards from Canadian medical association bodies for the best presentation by a newly established investigator and for distinguished service in medical education. Dr. Hodges and his colleagues have previously received funding from the Medical Council of Canada for related research, and he was a co-investigator on a 1996-97 NBME Fund project (A Systematic Evaluation of the Use of Adolescent Standardized Patients in the OSCE Format; Mark Hanson, MD, principal investigator). He has copublished widely on research of the use of OSCEs in psychiatry training.

Other co-investigators who are part of this project’s research team include: Glenn Regehr, PhD, associate professor in the Departments of Psychiatry and Surgery at University of Toronto and associate faculty in the Department of Higher Education at the Ontario Institute for Studies in Education; and Nancy McNaughton, associate director of the standardized patient program in the Department of Family and Community Medicine, and research assistant and examination coordinator for the Department of Psychiatry.

Washington University School of Medicine

Principal Investigator: David J. Murray, MD
Grant Amount/ Duration: $50,000 for a two-year project
Project Title: "Clinical Simulation: Developing an Acute Care Skills Evaluation for Medical School Graduates"

Learn More...

According to Dr. Murray’s proposal, "medical school graduates are expected to recognize and treat a variety of life-threatening conditions, and the knowledge and clinical skills required to manage these crisis situations are difficult to teach and evaluate in a classroom or clinical setting." The purpose of this study is to evaluate students’ knowledge, judgment, and technical skills in managing a simulated acute clinical situation. A commercially available life-size electromechanical mannequin will be used to simulate crises and provide a standardized, reproducible acute care environment to evaluate senior medical students.

The project’s objectives are: (1) to develop ten different acute care simulations that reflect clinical situations a graduating medical student should have requisite clinical skills to recognize and treat; (2) to test various scoring methodologies that could be used to quantify the results from a simulation exercise; and (3) to gather performance data to assess the psychometric adequacy of the scores obtained from the simulation scenarios and define performance expectations for these simulations using the input of clinical faculty who evaluate medical student performance.

In the second year of the grant, entry-level residents from three different disciplines (pediatrics, emergency medicine, and anesthesia) will be tested to determine how these scenarios can be used with medical school graduates. Additional validity testing will be undertaken through the comparison of scores from "undifferentiated" medical students and residents who have already established a career path.

According to Dr. Murray,

the long term goals of the project are to develop an acute care skills set that could be applied to evaluate the competence of graduating physicians either upon completion of medical school or prior to starting clinical training in graduate medical education. Simulation technology offers the potential for developing identical, reproducible scenarios to evaluate competence in various clinical simulated patient care settings that could be applied to evaluate medical school graduates across the nation.

Dr. Murray is director of medical student education, and professor and director of the Clinical Simulation Center within the Department of Anesthesiology at Washington University School of Medicine. He has published his clinical research in Anesthesia Today, and in Anesthesiology, and the article "Clinical Simulation: Technical Novelty or Innovation in Education" in the latter journal in 1998. Dr. Murray received his medical degree from the University of Saskatchewan College of Medicine, Saskatoon, Saskatchewan, Canada. Julie Woodhouse, RN, administrator of the Clinical Simulation Center, will serve as study coordinator. Ms. Woodhouse has experience in developing, teaching, and evaluating courses that are currently conducted at the center for paramedics, nurse anesthetists, anesthesia residents, emergency room residents, and faculty from anesthesia and emergency medicine.

Three consultants will assist Dr. Murray on the project: Joseph Kras MD, DDS, associate director of the Clinical Simulation Center, who will coordinate evaluation activities in the project; John Boulet, PhD, acting director of test development and research at the Educational Commission for Foreign Medical Graduates (ECFMG), who will provide input on measurement issues; and Amitai Ziv, MD, who has worked as a consultant to the ECFMG in the area of clinical skills assessment since 1992, and has extensive experience with the simulated platform to be used in this project.

*Note: As of March 24, 2000 the Medical Education Research Fund is now known as the Edward J. Stemmler Medical Education Research Fund.

1997–1998 Grantees

Baylor College of Medicine, in Houston, Texas

Principal Investigator: Marianna M. Sockrider, M.D., Dr. P.H., Assistant Professor, Department of Pediatrics
Grant Amount/ Duration: $25,000 for a one-year project
Project Title: "Exploring the Costs and Benefits of Qualitative Analysis of Student-Generated Learning Issues in a Problem Based Learning Course"

Learn More...

This project will refine the analysis system developed in a pilot study which examined learning issue content in problem based learning (PBL) coursework. A defined coding system and qualitative analysis software will be applied to sort and summarize data. The project will also establish the system’s inter-rater reliability, explore its validity and utility in evaluation of course content and self-directed learning, and examine the resources required for and benefits of conducting qualitative analysis of student-generated learning issues in PBL.

East Tennessee State University (ETSU) College of Medicine

Principal Investigator: F. Forrest Lang, MD
Grant Amount/ Duration: $49,900 for a two-year project
Project Title: "Evaluation of a Competency Rating System for Assessing High Impact Communication Skills"

Learn More...

This project will produce a system for assessing operationalized measures of the communication skills recently identified by the Toronto Consensus Statement. The Consensus Statement was written by the world’s authorities on patient-doctor relationships and identified those "high-impact" communication skills known to positively impact on patient satisfaction and on disease outcome. The study will utilize students at two medical schools, ETSU and Tulane University Medical School (which differ significantly in approach and content around communication skills) in its research.

The Johns Hopkins University School of Medicine

Principal Investigator: Harold P. Lehmann, MD
Grant Amount/ Duration: $49,720 for a one-year grant
Project Title: "The Use of Decision Analysis to Evaluate Progress Through a Case Simulation"

Learn More...

In this project, an evaluation "engine" for clinical cases will be built using decision analysis and two measures, posterior probabilities (PP) and expected value of information (EVI). The study will investigate the ability of this engine to evaluate a student’s progress through clinical case simulations using those measures and its ability to assess their problem-solving strategies. The study will ultimately evaluate the face and construct validities of this assessment approach.

University of California, Irvine - College of Medicine(UCI-COM)

Principal Investigator: Michael D. Prislin, MD
Grant Amount/ Duration: $49,998 for a two-year project
Project Title: "Development of Web-based Evaluative Methods to Assess Medical Students' Acquisition and Use of Informatics Skills in Clinical Problem Solving"

Learn More...

This project will develop assessment methodologies which allow for rapid standard evaluation of student information retrieval skills and provide immediate feedback of related student activities, provide a tool to assess student medical informatics performance in a standardized patient-based performance assessment, and provide an opportunity to evaluate the potential impact of applying medical informatics skills to patient care outcomes.

University of Kentucky College of Medicine

Principal Investigator: Charles H. Griffith, III, MD
Grant Amount/ Duration: $48,626 for a two-year project
Project Title: "Teaching Quality Student Performance and Residency Choice"

Learn More...

This project will expand on the results of a pilot study which utilized qualitative analysis of the influence of outstanding teachers on ultimate residency choice. More specifically, it will investigate the influence of teaching quality on long-term outcomes (e.g., USMLE™ Step 2 scores and end-of-third-year practical examinations) and establish the feasibility and validity of a new method to assess faculty and resident teaching ability from the performance of their students.

University of Washington School of Medicine

Principal Investigator: Barbara A. Goff, M.D
Grant Amount/ Duration: $48,669 for a two-year project
Project Title: "Development of an Objective Structured Assessment of Surgical Skills"

Learn More...

The primary objectives of this project are to evaluate the effectiveness of a core surgical curriculum to teach surgical skills, and to develop tools to assess these surgical skills objectively.

1996–1997 Grantees

University of Michigan Medical School

Principal Investigator: Larry D. Gruppen, PhD
Grant Amount/ Duration: $50,000 for a two year project
Project Title: "Measuring medical student self-assessment: A methodological and educational investigation"

Learn More...

Although many medical schools include among their goals that of helping students to become self-directed learners, the lack of consensus on a precise definition of self-directed learning has hindered the development of a solid research base that could guide educational and assessment efforts. In this project, we seek to clarify the concept of self-directed learning by identifying self-assessment as a necessary component skill for self-directed learning. We propose to refine the measurement of self-assessment and clarify its psychological dynamics through three studies.

In Study 1, we will develop a longitudinal database that will allow us to evaluate the impact of personal and task characteristics on self-assessment abilities and strengths and weaknesses on how they spend subsequent educational time, and we will attempt to clarify the impact of educational experiences on changes in clinical performance and corresponding self-assessment of that performance. In Study 3, we will concentrate on the psychological meaning and dynamics of self-assessment as a metacognitive process, seeking to determine the links this skill might have with other metacognitive phenomena and personality characteristics.

The results of this project will provide a clearer basis for understanding: (1) how accurate self-assessment might be used as a guide for students= self-directed learning decisions; (2) how self-assessment might best be measured; (3) the psychological dynamics of the self-assessment process and its relation to other important cognitive processes; and (4) the means by which self-assessment, and by extension, self-directed learning, might be influenced through educational interventions.

NBME logo