Teaching and Learning in Medicine: Volume 33 - Issue 4 - p 366-381
CSE scores for students from eight schools that moved Step 1 after core clerkships between 2012 and 2016 were analyzed in a pre-post format. Hierarchical linear modeling was used to quantify the effect of the curriculum on CSE performance. Additional analysis determined if clerkship order impacted clinical subject exam performance and whether the curriculum change resulted in more students scoring in the lowest percentiles before and after the curricular change.
Academic Medicine: July 2019 - Volume 94 - Issue 7 - p 926-927
A response to concerns regarding potential bias in the implementation of machine learning (ML) to scoring of the United States Medical Licensing Examination Step 2 Clinical Skills (CS) patient notes (PN).
Academic Medicine: March 2019 - Volume 94 - Issue 3 - p 371-377
Schools undergoing curricular reform are reconsidering the optimal timing of Step 1. This study provides a psychometric investigation of the impact on United States Medical Licensing Examination Step 1 scores of changing the timing of Step 1 from after completion of the basic science curricula to after core clerkships.
Academic Medicine: March 2019 - Volume 94 - Issue 3 - p 314-316
The United States Medical Licensing Examination Step 2 Clinical Skills (CS) exam uses physician raters to evaluate patient notes written by examinees. In this Invited Commentary, the authors describe the ways in which the Step 2 CS exam could benefit from adopting a computer-assisted scoring approach that combines physician raters’ judgments with computer-generated scores based on natural language processing (NLP).
Investigación en Educación Médica, Vol. 8, Núm. 29, 2019
Journal of Pain and Symptom Management: Volume 56, Issue 3, p371-378
This article reviews the USMLE step examinations to determine whether they test the palliative care (PC) knowledge necessary for graduating medical students and residents applying for licensure.
Journal of Medical Regulation (2018) 104 (2): 51–57
There have been a number of important stakeholder opinions critical of the Step 2 Clinical Skills Examination (CS) in the United States Medical Licensing Examination (USMLE) licensure sequence. The Resident Program Director (RPD) Awareness survey was convened to gauge perceptions of current and potential Step 2 CS use, attitudes towards the importance of residents' clinical skills, and awareness of a medical student petition against Step 2 CS. This was a cross-sectional survey which resulted in 205 responses from a representative sampling of RPDs across various specialties, regions and program sizes.
Journal of Educational Measurement, 55: 308-327
The widespread move to computerized test delivery has led to the development of new approaches to evaluating how examinees use testing time and to new metrics designed to provide evidence about the extent to which time limits impact performance. Much of the existing research is based on these types of observational metrics; relatively few studies use randomized experiments to evaluate the impact time limits on scores. Of those studies that do report on randomized experiments, none directly compare the experimental results to evidence from observational metrics to evaluate the extent to which these metrics are able to sensitively identify conditions in which time constraints actually impact scores. The present study provides such evidence based on data from a medical licensing examination.
Academic Medicine: May 2018 - Volume 93 - Issue 5 - p 781-785
In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred.
Journal of Veterinary Medical Education 2018 45:3, 381-387
This study uses item response data from the November–December 2014 and April 2015 NAVLE administrations (n =5,292), to conduct timing analyses comparing performance across several examinee subgroups. The results provide evidence that conditions were sufficient for most examinees, thereby supporting the current time limits. For the relatively few examinees who may have been impacted, results suggest the cause is not a bias with the test but rather the effect of poor pacing behavior combined with knowledge deficits.