Academic Medicine: Volume 99 - Issue 3 - p 325-330
This retrospective cohort study investigates the association between United States Medical Licensing Examination (USMLE) scores and outcomes in 196,881 hospitalizations in Pennsylvania over 3 years.
Diagnosis: Volume 10, Issue 1, Pages 54-60
This op-ed discusses the advantages of leveraging natural language processing (NLP) in the assessment of clinical reasoning. It also provides an overview of INCITE, the Intelligent Clinical Text Evaluator, a scalable NLP-based computer-assisted scoring system that was developed to measure clinical reasoning ability as assessed in the written documentation portion of the now-discontinued USMLE Step 2 Clinical Skills examination.
Academic Medicine: Volume 97 - Issue 11S - Page S176
As Step 1 begins to transition to pass/fail, it is interesting to consider the impact of score goal on wellness. This study examines the relationship between goal score, gender, and students’ self-reported anxiety, stress, and overall distress immediately following their completion of Step 1.
Academic Medicine: June 2022
This study examines the associations between Step 3 scores and subsequent receipt of disciplinary action taken by state medical boards for problematic behavior in practice. It analyzes Step 3 total, Step 3 computer-based case simulation (CCS), and Step 3multiple-choice question (MCQ) scores.
Journal of Graduate Medical Education: Volume 14, Issue 3, Pages 353-354
Letter to the editor.
Medical Science Educator: Volume 31, p 607–613 (2021)
This study extended previous research on the NBME Clinical Science Mastery Series self-assessments to investigate the utility of recently released self-assessments for students completing Family Medicine clerkships and Emergency Medicine sub-internships and preparing for summative assessments.
Teaching and Learning in Medicine: Volume 33 - Issue 4 - p 366-381
CSE scores for students from eight schools that moved Step 1 after core clerkships between 2012 and 2016 were analyzed in a pre-post format. Hierarchical linear modeling was used to quantify the effect of the curriculum on CSE performance. Additional analysis determined if clerkship order impacted clinical subject exam performance and whether the curriculum change resulted in more students scoring in the lowest percentiles before and after the curricular change.
Evaluation & the Health Professions: Volume: 43 issue: 3, page(s): 149-158
This study examines the innovative and practical application of DCM framework to health professions educational assessments using retrospective large-scale assessment data from the basic and clinical sciences: National Board of Medical Examiners Subject Examinations in pathology (n = 2,006) and medicine (n = 2,351).
Journal of Educational Measurement: Volume 55, Issue 2, Pages 308-327
The widespread move to computerized test delivery has led to the development of new approaches to evaluating how examinees use testing time and to new metrics designed to provide evidence about the extent to which time limits impact performance. Much of the existing research is based on these types of observational metrics; relatively few studies use randomized experiments to evaluate the impact time limits on scores. Of those studies that do report on randomized experiments, none directly compare the experimental results to evidence from observational metrics to evaluate the extent to which these metrics are able to sensitively identify conditions in which time constraints actually impact scores. The present study provides such evidence based on data from a medical licensing examination.
Journal of Veterinary Medical Education 2018 45:3, 381-387
This study uses item response data from the November–December 2014 and April 2015 NAVLE administrations (n =5,292), to conduct timing analyses comparing performance across several examinee subgroups. The results provide evidence that conditions were sufficient for most examinees, thereby supporting the current time limits. For the relatively few examinees who may have been impacted, results suggest the cause is not a bias with the test but rather the effect of poor pacing behavior combined with knowledge deficits.