A response to concerns regarding potential bias in the implementation of machine learning (ML) to scoring of the United States Medical Licensing Examination Step 2 Clinical Skills (CS) patient notes (PN).
Schools undergoing curricular reform are reconsidering the optimal timing of Step 1. This study provides a psychometric investigation of the impact on United States Medical Licensing Examination Step 1 scores of changing the timing of Step 1 from after completion of the basic science curricula to after core clerkships.
The United States Medical Licensing Examination Step 2 Clinical Skills (CS) exam uses physician raters to evaluate patient notes written by examinees. In this Invited Commentary, the authors describe the ways in which the Step 2 CS exam could benefit from adopting a computer-assisted scoring approach that combines physician raters’ judgments with computer-generated scores based on natural language processing (NLP).
This article reviews the USMLE step examinations to determine whether they test the palliative care (PC) knowledge necessary for graduating medical students and residents applying for licensure.
There have been a number of important stakeholder opinions critical of the Step 2 Clinical Skills Examination (CS) in the United States Medical Licensing Examination (USMLE) licensure sequence. The Resident Program Director (RPD) Awareness survey was convened to gauge perceptions of current and potential Step 2 CS use, attitudes towards the importance of residents' clinical skills, and awareness of a medical student petition against Step 2 CS. This was a cross-sectional survey which resulted in 205 responses from a representative sampling of RPDs across various specialties, regions and program sizes.
The widespread move to computerized test delivery has led to the development of new approaches to evaluating how examinees use testing time and to new metrics designed to provide evidence about the extent to which time limits impact performance. Much of the existing research is based on these types of observational metrics; relatively few studies use randomized experiments to evaluate the impact time limits on scores. Of those studies that do report on randomized experiments, none directly compare the experimental results to evidence from observational metrics to evaluate the extent to which these metrics are able to sensitively identify conditions in which time constraints actually impact scores. The present study provides such evidence based on data from a medical licensing examination.
In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred.
This study uses item response data from the November–December 2014 and April 2015 NAVLE administrations (n =5,292), to conduct timing analyses comparing performance across several examinee subgroups. The results provide evidence that conditions were sufficient for most examinees, thereby supporting the current time limits. For the relatively few examinees who may have been impacted, results suggest the cause is not a bias with the test but rather the effect of poor pacing behavior combined with knowledge deficits.
This review is a descriptive summary of the development of National EM M4 examinations, Version 1 (V1) and Version 2 (V2), and the NBME EM Advanced Clinical Examination (ACE) and their relevant usage and performance data. In particular, it describes how examination content was edited to affect desired changes in examination performance data and offers a model for educators seeking to develop their own examinations.