NBME logo
  • Home
  • Programs and Services
  • About the NBME
  • Research
  • Publications
  • Contact NBME
  • Center for Innovation
  • Stemmler Medical Education Research Fund
    • Online Application Center
    • Current Grant Information
    • Prior Grant Information

Stemmler Fund Prior Grant Information

In addition to viewing individual grant abstracts below, you can also click here to open or close all abstracts on this page.

This is the listing of prior grant informaton for the last three years. To see information for ALL prior grants, click here.

2007–2008 Grantees

University at Illinois at Chicago

Principal Investigator: Dr. Alan Schwartz
Grant Amount / Duration: $149,310 / 2 years
Project Title: Measuring Quality of Medical Student Performance at Contextualizing Care
[–] Click HERE to close Abstract

Clinical decision making requires two distinct skills: the ability to classify patients' conditions into diagnostic and management categories that permit the application of "best evidence" guidelines, and the ability to individualize or - more precisely - to contextualize care for patients whose circumstances and needs require variation from the standard approach to care. Most assessment in medical education places heavy emphasis on biomedical decision-making with little emphasis on how to incorporate contextual factors that may be essential to planning patients' care.

The goal of this project is to demonstrate and provide validity evidence for an innovative standardized patient (SP) method of assessing medical students in the clinical years on their ability to detect and respond to individual contextual factors in a patient encounter that overcomes the aforementioned challenges. As such, the project is designed to directly address the Stemmler Fund goal of research and development of innovative assessment approaches to enhance the evaluation of those preparing to practice medicine.

During the project, 144 fourth-year medical students participating in a Medicine sub-internship will be randomized to an intervention group or a control group; the intervention group will receive additional training in the application of qualitative methodology to elicit and incorporate contextual factors in the clinical encounter. All students will participate in an SP assessment consisting of four SPs, blinded to trial arm, presenting cases with and without important biomedical and contextual factors in a counterbalanced factorial design. Performance will be compared between trial arms. In addition, performance will be compared with USMLE Step 2 clinical knowledge scores to determine whether contextualizing ability is independent of clinical knowledge, and consistency of performance across individual SP cases will be studied to determine the number of cases necessary to achieve sufficient reliability for the assessment to be used.

The outcomes of this project, which will be widely disseminated to permit replication at other medical schools, will include: (1) a well-documented method for developing SP assessments designed to test the ability of a trainee to contextualize care, (2) evidence for the ability of the assessments to distinguish between trainees with differing levels of skill in contextualization using a randomized controlled educational trial, (3) evidence that assessment scores are not predicted by clinical knowledge, and (4) evidence for internal consistency in scoring of the assessments.

University of New Mexico Health Sciences Center

Principal Investigator: Dr. Teresita McCarty, MD
Grant Amount / Duration: $150,000 / 2 years
Project Title: A Web-based Program for the Deliberate Practice and Formative Assessment of Writing Patient Notes
[–] Click HERE to close Abstract

Rationale: Research into effective approaches to significant learning has generated two powerful descriptive models: formative assessment and deliberate practice. Although the two models approach student learning from somewhat different perspectives, the congruence between them is striking. Unfortunately, realizing the full power of either model requires significant manpower, time, and expertise - resources that are difficult to achieve and to sustain. Technological approaches can reduce the human resource requirements and make the learning benefits of these models more accessible for medical educators and their students.

Objectives: In support of the overarching goal of improving medical students' clinical reasoning skills, this proposal specifically aims to evaluate the core, shared strengths of the formative assessment and deliberate practice models as implemented through a resource-sparing, web-based technology - Calibrated Peer ReviewT. The objectives are to answer the following questions. While using Calibrated Peer Review do learners:

  1. Focus on improving performance in the well-defined task of patient note-writing?
  2. Recognize and give informative feedback?
  3. Incorporate feedback into iterative practice to improve performance?
  4. Report that the new learning is integrated successfully into new performances?

Methods: This proposal begins with archival data from patient notes written in Calibrated Peer Review T by four medical student classes (2005-2008) and continues gathering note-writing data, along with survey and focus group information, from two additional classes (2010 & 2011). Four studies are proposed. 1) The archival study evaluates individual factor correlations using the scores from 3,468 completed "assignments," as well as trends in score deviation of the peer review and self-assessment over time. 2) The quality of feedback study codes students' narrative feedback to analyze quality trends over assignment iterations. 3) The perceptions of feedback study asks students' opinions about sample feedback of varying quality. 4) The student survey and focus group study constructs questions to assess students' habits in preparation for note-writing during the performance examinations, their perceptions of the CPR system, and how it aids their learning. Descriptive statistics, inferential statistics, qualitative analyses, and analysis of change via Hierarchical Linear Modeling, as relevant to each research question, will be conducted.

Significance for Assessment in Medical Education: This web-based program applies the principles of formative assessment and deliberate practice to provide a powerful learning experience for students. It emphasizes the importance of the learner's focus on improving performance, accurate observation and informative feedback, repetition, and the formation of cognitions that reflect the true complexity of the clinical task and thus bring the learner to integrate that improvement into his or her everyday work. The judicious use of this technological approach reduces the intensive resources required for effective learning - a true innovation in 'assessment as learning' in medical education.

Duke University

Principal Investigator: Dr. Jeffrey Taekman, MD
Grant Amount / Duration: $149,297 / 2 years
Project Title: Standardized Teamwork Skills Assessment: Feasibility, Reliability, and Validity.
[–] Click HERE to close Abstract

Background and Rationale: Successful delivery of health care relies on effective team coordination and communication. A major shortcoming of the recent push toward greater education of teamwork skills in medicine is our limited ability to assess the effectiveness of different forms of team skill training with respect to behavior change. There is a critical need to develop valid, reliable, and feasible methods of assessing health care team coordination skills. A number of observer-based rating tools have been implemented, with limited success. While some successes have been noted in the case of specialty-based tools for specific work environments, general team skill assessment tools intended to apply to a broader audience, or for the purpose of medical and nursing education, have not been strongly supported. Limitations of current assessment methods include: (1) a failure to attain high levels of inter-rater agreement with observer-based team skill rating scales, and (2) a failure to adequately assess teamwork skills with respect to managing difficult situations or difficult team members.

Objective: Our long-term goal is to develop, validate, and share a standardized team skill assessment (STSA) tool focused on evaluating critical health care teamwork behaviors. We have developed a STSA tool that embeds trainees with standardized team members (actors playing the roles of health care team members, similar to the accepted practice of using standardized patients) in difficult teamwork scenarios. Scoring of performance is observer-based rating of scenario-specific ideal team skill behaviors. This tool overcomes limitations of current assessment methods by (1) assessing trainees in a standardized scenario that is the same for each trainee (not dependent on the participation of other trainees) (2) using scenarios that stress critical teamwork skills in difficult scenarios, and (3) basing scoring on specific observable behaviors that are easily identified, within the context of each scenario. The primary objective of this project is to assess the feasibility, reliability, and validity of this STSA.

Methods: We will use the STSA tool to assess 30 medical and nursing students before and after teamwork training content is delivered in their capstone courses. We will assess reliability, validity, and generalizability through multivariate generalizability analysis of behavioral ratings by standardized team members (immediately following interaction) and trained observers (from videotape). Generalizability analysis is a statistic useful for assessing the reliability of a measure when there are multiple sources of variability in the measure. We will assess the proportion of variance in team skill ratings attributable to: (1) trainee, (2) pre- vs post-training, (3) rater type (actor/live vs. observer/tape), (4) medical vs. nursing student, (5) rater, and (6) scenario. We hypothesize a large proportion of variability in student scores attributed to trainee and pre- vs. post-training. We expect a low proportion of the variability attributed to rater and rater type. We will predict the number of repetitions (e.g., number of scenarios) required to achieve a reliable team skill score. We will also assess feasibility and validity of the STSA through subjective surveys of clinician-teachers, trained observers, standardized team members, and students.

Advancing assessment in medical education and practice: This research is innovative because it presents a novel approach to assessing team skills and a rigorous method of analyzing the feasibility, reliability, and validity of that approach. We expect the results of this effort to advance knowledge in medical education assessment as it relates specifically to measuring health care teamwork skills. This includes advancement in knowledge related to details such as scenario presentation, actor involvement, and rater training for the purposes of low fidelity simulation-based assessment. We also expect to advance knowledge related to methods of analyzing the validity, reliability, and generalizability of these types of assessment tools.

2006–2007 Grantees

McMaster University, Hamilton, Ontario

Principal Investigator: Kelly Dore Banks, PhD (ABD)
Grant Amount / Duration: $145,870 / 2 years
Project Title: The evaluation of the reliability, validity, feasibility, and acceptability of a web-based instrument to measure professional qualities in medical school applicants
[–] Click HERE to close Abstract

Rationale: Health professions’ admission committees across the world are faced with the difficult task of selecting, from among many eligible applicants, the select few who will be admitted to their training programs. The determinants of this admissions process are often a combination of cognitive measures, such as Grade Point Average (GPA) or standardized tests such as the Medical College Admission Test (MCAT) and non-cognitive measures, including interviews and essays. However, there has been limited success in the development of evaluation tools that will provide reliable and valid measures of an applicant’s non-cognitive qualities. The exception to this is the MMI, in essence an admissions OSCE. The MMI has been shown to predict intramural and licensing examination performance. However, like any OSCE the MMI has practical limitations; the sheer volume of candidates for many institutions makes it necessary to develop a reliable and valid strategy for screening candidates’ non-cognitive attributes in a more efficient fashion. To this end a new measure, using video scenarios and written or audio responses was developed and a pilot study was completed. In 2006, 110 applicants to McMaster’s medical school completed this Computer-based Multiple Sample valuation of Non-cognitive Skills (CMSENS). Of those applicants, 78 completed the CMSENS by verbally recording their responses in an audio file while 32 typed their responses. The overall test generalizability was .86 for the audio CMSENS, and .72 for the written. The written CMSENS also demonstrated predictive validity, correlating with the MMI at .51. However, conclusions from this study are limited because of the small sample and the one-time nature of the findings.

Objectives:

  1. Assess the impact of varied test time and proctoring on the reliability and validity
  2. Determine the predictive validity of CMSENS
  3. Determine the reliability of the pilot results with a larger, more diverse sample

Methods: To achieve the first objective, applicants to McMaster’s medical school will be invited to participate in the CMSENS in winter 2008. These applicants will complete a CMSENS in which the length of response time and proctoring will be manipulated. In addition to reliability and validity analysis to determine the optimal testing format, participants invited to interview will have their scores compared to MMI performance. To assess objective 2, several methods will be followed. In-program performance results will be assessed for the about 25 members of the medical class of 2009 who participated in the CMSENS pilot project and gained admission. In addition, construct validity will be examined by recruiting students in their final year of medical school and second year residents to participate in a mock-CMSENS, thereby allowing a comparison between (a) scores assigned to the medical school applicants and more senior trainees, and (b) for the final year students and residents, comparison between scores assigned on the CMSENS and those received on the Canadian qualifying examinations; specifically Part II of the MCCQE ( an OSCE) which evaluates both cognitive and non-cognitive characteristics of medical trainees. To satisfy the 3rd objective, a sample of applicants applying in the winter 2009, will be administered the CMSENS. A larger sample in this year will permit accurate assessment of reliability and facilitate comparison of CMSENS performance to MMI & in-program results.

Significance for Medical Education and Practice: This innovative assessment tool, if it is proven reliable and valid, has the potential to allow educators in the health professions to efficiently assess the non-cognitive qualities of the thousands of applicants to training programs, for whom reliable and valid was previously impossible.

Southern Illinois University School of Medicine, Springfield, Illinois

Principal Investigator: Dr. Richard Rosher
Grant Amount / Duration: $150,000 / 2 years
Project Title: An Objective Measure to Assess Resident Competency in Systems-Based Practice
[–] Click HERE to close Abstract

Project objectives: The ACGME has directed that all residents must meet six competencies. The sixth competency, Systems-Based Practice, has presented a challenge for assessment.

The objective of this project is: to develop a standardized, objective, innovative method to measure the sixth competency: Systems Based Practice.

An OSSIE (Objective Structured System Interaction Examination) will evaluate the resident’s ability to interact with the health care team, deal with aspects of the health care system, coordinate effective care across settings, and provide cost-effective care.

Rationale and primary methods to be employed: In today’s health care system, not only must physicians be competent in their knowledge and practice of medical care, but they must be leaders of teams composed of other health care providers. They must be able to assist their patients to navigate the health care system and insure that continuity of care is promoted across health care settings. They must be cognizant of costs of various treatments.

Four required skills are identified by the ACGME that must be attained in order to be judged competent in Systems-Based Practice. The resident must:

  1. be able to understand the interaction of physician practices with the larger system, resources, and providers.
  2. have knowledge of practice and delivery systems.
  3. practice cost-effective care.
  4. advocate for patients within the health care system.

Three scenarios involving patients, families, and members of the health care team will be developed to test each of these four required skills for a total of twelve scenarios. The scenarios will be presented in the format of an OSCE. The new examination for residents, the OSSIE, will be given to PGY2 residents in the middle of their second year. It will be a formative examination that will evaluate the residents’ competence and enable tailoring of their third year to improve these competencies. Generalizability analysis will be used to determine inter-case reliability. Correlations between exam scenarios and ratings by observers in practice situations will determine the validity of the OSSIE.

How the proposed research will advance assessment in resident education: Currently, there are few measures of Systems-Based Practice to use in assessing residents. The proposed research will advance assessment in resident education by investigating an innovative, objective method of assessing the ACGME competency of Systems Based Practice. This new method of assessment, the OSSIE, will be a modification of the traditional OSCE. A simulation of interaction with other members of the health care team will capture abilities needed for physicians in today’s complex health care system. By using the OSSIE, faculty will be able to provide constructive feedback to each resident to enable improvement in Systems-Based Practice.

University of Pennsylvania, Philadelphia, PA

Principal Investigator: David Asch, MD
Grant Amount / Duration: $149,820.55 / 1 year
Project Title: Clinical Outcome-Based Assessment of Medical Education: Concept and Evaluation
[–] Click HERE to close Abstract

The overall goal of this research project is to demonstrate the feasibility and examine the usefulnessof evaluating the quality of clinical training programs by assessing the clinical outcomes of the patients later cared for by the graduates of those training programs. The concept is premised on the view that although medical education serves a collection of intermediate goals, in the end the most important clinical goal is to improve the health of individuals and populations. We may mean many different things when we say that a medical school or a residency program is good, or that one medical school or residency program is better than another. However, stakeholders, including prospective trainees, health systems, and patients, could be justified in expecting at least one specific meaning: that graduates of good training programs in general take care of patients well, and that graduates of better training programs in general take care of patients better.

This project represents a study of “proof of concept,” using as our test case the analysis of maternal birth treatment and outcomes to inform the assessment of residency training in obstetrics and gynecology. We will use data from all hospital-based deliveries in New York and Florida between 1992 and 2006 to test the relationship between residency program, physician characteristics, and maternal outcomes. Our measures of performance will be:

a) use of Caesarean section; (b) whether a woman who delivered vaginally had a 4th degree perineal laceration; (c) whether a woman experienced any adverse outcome, as defined by HealthGrades(2004); and (d) a complication measure developed by Epstein, Ketcham and Nicholson (2006) that assigns larger positive values to complications that result in long hospital stays for the mother. Using these data and measures, we will address the following specific questions:

1) How much variation in inter-physician performance is explained by residency program and year of residency graduation?

2) Can residency programs be categorized reliably according to the treatment patterns and patient outcomes of their graduates?

3) Is there systematic variation in residency program effects?

We believe these notions are consistent with the goals of the Stemmler Fund and the National Board of Medical Examiners more generally, because they incorporate new methods of assessment in which the influence of training programs on clinical outcomes is assessed independently of patient and physicians characteristics. At the conclusion of this project, we expect to have a deeper and more specific understanding of the promise and limitations of evaluating medical training programs using clinical outcomes; we expect to have a series of manuscripts describing our conceptual view, analytic approach, and results; and we expect to have identified the next steps toward further development and evaluation of this assessment concept.

2005–2006 Grantees

University of California, San Francisco

Principal Investigator: Karen E. Hauer, MD
Grant Amount / Duration: $149,167 / 2 years
Project Title: Cultural Competence Using Shared Decision Making
[–] Click HERE to close Abstract

Objectives: We propose to assess the reliability and validity of a shared decision making checklist as a tool for evaluating medical student cultural competence in standardized patient encounters. We will validate shared decision making checklist ratings by correlating scores with global assessments of cultural competence made by clinician experts in cultural competence. Additionally, we will compare ratings of cultural competence to ratings of general communication skills as measured by the Common Ground instrument in these standardized patient encounters to determine the extent to which cultural competence overlaps with communication skills proficiency.

Background: Failure to develop care plans that incorporate information about patients’ cultural backgrounds and values contributes to important disparities in health care. Medical schools and residency training programs are now required to teach and assess trainees’ cultural competence skills. However, a review of the literature indicates that measures of cultural competency suffer from significant deficits. Most studies rely on measurements of attitudes or skill selfassessments, and the minority of studies that do have skills-based outcomes assess only a subset of relevant competencies. Thus, while medical schools nationally are emphasizing the importance of patient-centered care and cultural competence, they lack the ability to measure the degree to which students are mastering these core concepts.

Methods: We will determine the reliability and validity of a shared decision making checklist as a measure of cultural competence. Using purposeful sampling to identify students of different genders, clinical skills competence, and races, we will select 200 videotaped encounters from 50 third year medical students’ interactions with four standardized patients. Trained coders will score the student-standardized patient encounters using a shared decision making checklist. Two faculty cultural competence experts who will be blinded to the study hypothesis, checklist content, and scores will perform global assessments of cultural competence based on review of the same videotaped encounters. The standardized patients’ ratings of general communication skills using the Common Ground instrument will be used to assess trainees’ communication skills. Reliability of the three ratings instruments will be calculated. We will assess validity by correlating the shared decision making scores to the global assessments of cultural competence. We will assess concurrent validity by correlating the shared decision making results with the communication skills scores. We will explore reliability by conducting several generalizability studies. This analysis will determine the number of raters and cases needed to obtain reliable cultural competence scores.

Implications for assessment: Our results will inform the assessment literature by evaluating the use of a shared decision making checklist for assessing cultural competence and determining the degree to which cultural competence correlates with communication skills. These results will facilitate evaluation of the efficacy of cultural competence curricula.

University of Missouri-Columbia School of Medicine

Principal Investigator: Kimberly G. Hoffman, PhD
Grant Amount / Duration: $150,000 / 2 years
Project Title: Use of Portfolios to Assess Medical Student Outcomes
[–] Click HERE to close Abstract

The public in general and professional organizations in particular are increasingly demanding evidence of competence in medical practice and a physician’s ability to meet the demands of today’s society (IOM, 2001; 2003). Medical education has responded with a focus on educational outcomes (Whitcomb, 2004), case-based, authentic, curricula (Friedman, 2001; Kincade, 2005) and experiences that support the development of physicians within a complex health care system (ACGME, 2005; AAMC Report V, 2001). The emerging definition of professional competence is difficult to evaluate using traditional assessment. The portfolio addresses the current limitations of assessment by integrating professional outcomes and placing them within an authentic learning context. Challenges in portfolio assessment include insufficient inter-rater reliabilities, questions of generalizability, a substantial faculty and learner time commitment, and balancing a prescriptive, standardized approach with individualization (Friedman, et. al. 2001; Case, 1994; Des Marchais et al 1995; Challis 1999: LeMahieu, et.al 1993; Herman et al, 1995).

The University of Missouri has developed a set of key competencies for our graduates (MU2020 key characteristics) that are consistent with national and international discussions of professional competence. To our knowledge few medical schools have successfully engaged faculty in developing an approach for assessment of professional competencies. This proposed research draws on the prior work at MU to address two global questions: 1) How does the development of a set of descriptive anchors for each of the key characteristics influence the validity, reliability, reproducibility and trustworthiness of portfolio assessment? 2) How do student contributions to the portfolio influence faculty assessment of portfolios?

Descriptive anchors of exemplary performance for each of the professional outcomes will be derived from the literature, clinical faculty, medical students and patients. These anchors will be used to develop a portfolio assessment tool. A twostep judgmental review process will establish the content validity of the descriptive anchors. Inter and intra rater reproducibility will be established by using the assessment tool to evaluate the portfolios of third year medical students. Predictive validity will be determined by correlation of portfolio assessment with traditional measures of student success. The influence of student contributions to the portfolio assessment will be evaluated by determining the differences between individual faculty ratings of students portfolios rated with only required documentation and rated a second time with student contributions included. An external advisory board will provide guidance to the research team and will review the appropriateness of the intermediate research projects.

The outcome of this project will be a portfolio assessment tool to evaluate student outcomes. It will be a useful addition in the assessment of learners and promote an enhanced understanding of professional competence.

Columbia University - College of Physicians and Surgeons

Principal Investigator: Peter C. Wyer, MD
Grant Amount / Duration: $149,975 / 2 years
Project Title: Designing Cognitive Measures of Practice-Based Learning and Improvement as an Iterative Process Combining Rasch and Classical Measurement Methods
[–] Click HERE to close Abstract

Currently, no psychometrically rigorous and developmentally informative instrument exists for assessing resident competencies in the cognitive domains encompassed by Practice Based Learning and Improvement (PBLI), as defined by the ACGME. Using an iterative process model, we propose to develop and empirically validate four cognitive measures tapped by a comprehensive PBLI instrument that permits periodic formative and summative assessments of residents competency levels as they progress through their programs in different specialties. Based upon preliminary experience with a relevant pilot project, we will develop an item pool addressing the following PBLI sub-competency domains: 1) analyzing practice experience, 2) using information technology to manage information and locate evidence from scientific studies related to patients’ health problems, 3) applying knowledge of study designs and statistical methods to the appraisal of clinical studies and other information on diagnostic and therapeutic effectiveness, and 4) assimilate evidence from scientific studies related to patient’s health problems. We propose that these domains, which conform to the standard cognitive domains of evidence based medicine (EBM), and are frequently summarized as ‘ask, acquire, appraise, and apply’, are best-suited for measuring the cognitive aspects of PBLI.

Initially, we will generate a pool of 100-150 Written-Structured Response items tied to sub-competency domain specifications, using different item formats (such as multiple choice, true-false items). The item pool will represent the relevant, observable facets of each of the sub-competency domains. Parallel forms of the PBLI instrument will be generated next, aligned with common assessment specifications that stipulate a weighting distribution for items tied to different competency domains and cognitive levels (Phases 1-2, Year 1). We will implement a rigorous content- and empirical- validation plan by testing each parallel form of the PBLI on samples of resident volunteers in medicine, pediatrics, and emergency medicine at New York Presbyterian Hospital, as well as upon residents in accredited programs from these specialties outside of our institution. We will use Rasch modeling techniques combined with methods from classical measurement theory to examine validity and reliability of the PBLI measures through these empirical trials. We will supplement the PBLI data with a structured survey to identify programs conforming to ‘best practices’ criteria in the target domains of EBM. Convergent validity evidence can thus be gathered and evaluated, along with evidence of resident group differences on PBLI measures in programs that are more or less compliant with EBM practices and teaching.

We believe that the resulting PBLI instrument(s) will provide a unique and critically important vehicle that, combined with existing performance-based assessment modalities, will make possible a comprehensive approach to evaluation of residents’competencies across a broad range of specialties. We believe that the PBLI instruments thus produced will fill a gap in the area of outcome assessments in residency programs, the absence of which currently limits the quality.