RESEARCH LIBRARY

View the latest publications from members of the NBME research team

Showing 11 - 19 of 19 Research Library Publications

Leveraging Machine Learning Technology to Improve Accuracy and Efficiency of Identification of Enemy Item Pairs

Posted: January 1, 2022 | Ian Micir, Kimberly Swygert, Jean D'Angelo

Journal of Applied Technology: Volume 23 - Special Issue 1 - Pages 30-40

The interpretations of test scores in secure, high-stakes environments are dependent on several assumptions, one of which is that examinee responses to items are independent and no enemy items are included on the same forms. This paper documents the development and implementation of a C#-based application that uses Natural Language Processing (NLP) and Machine Learning (ML) techniques to produce prioritized predictions of item enemy statuses within a large item bank.

Category:Assessment-Oriented Research, Scoring, Applications of Technology

Using Eye-Tracking Data as Part of the Validity Argument for Multiple-Choice Questions

Posted: December 4, 2021 | Victoria Yaneva, Brian E. Clauser, Amy Morales, Miguel Paniagua

Journal of Educational Measurement: Volume 58, Issue 4, Pages 515-537

In this paper, the NBME team reports the results an eye-tracking study designed to evaluate how the presence of the options in multiple-choice questions impacts the way medical students responded to questions designed to evaluate clinical reasoning. Examples of the types of data that can be extracted are presented. We then discuss the implications of these results for evaluating the validity of inferences made based on the type of items used in this study.

Category:Assessment-Oriented Research, Applications of Technology

Exploring the Association Between USMLE Scores and ACGME Milestone Ratings: A Validity Study Using National Data From Emergency Medicine

Posted: September 1, 2021 | Stanley J. Hamstra, Monica M. Cuddy, Daniel Jurich, Kenji Yamazaki, John Burkhardt, Eric S. Holmboe, Michael A. Barone, Sally A. Santen

Academic Medicine: Volume 96 - Issue 9 - Pages 1324-1331

This study examines associations between USMLE Step 1 and Step 2 Clinical Knowledge (CK) scores and ACGME emergency medicine (EM) milestone ratings.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

Multiple United States Medical Licensing Examination Attempts and the Estimated Risk of Disciplinary Actions Among Graduates of U.S. and Canadian Medical Schools

Posted: September 1, 2021 | Katie L. Arnhart, Monica M. Cuddy, David Johnson, Michael A. Barone, Aaron Young

Academic Medicine: Volume 96 - Issue 9 - Pages 1319-1323

This study examined the relationship between USMLE attempts and the likelihood of receiving disciplinary actions from state medical boards.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

Does Delaying the United States Medical Licensing Examination Step 1 to After Clerkships Affect Student Performance On Clerkship Subject Examinations?

Posted: December 28, 2020 | Daniel Jurich, Michelle Daniel, Karen E. Hauer, Christine Seibert, Latha Chandran, Arnyce R. Pock, Sara B. Fazio, Amy Fleming, Sally A. Santen

Teaching and Learning in Medicine: Volume 33 - Issue 4 - p 366-381

CSE scores for students from eight schools that moved Step 1 after core clerkships between 2012 and 2016 were analyzed in a pre-post format. Hierarchical linear modeling was used to quantify the effect of the curriculum on CSE performance. Additional analysis determined if clerkship order impacted clinical subject exam performance and whether the curriculum change resulted in more students scoring in the lowest percentiles before and after the curricular change.

Category:Assessment-Oriented Research, Links to Outcomes, Product-Oriented Research, USMLE

Automated Prediction Of Examinee Proficiency From Short-Answer Questions

Posted: December 10, 2020 | Le An Ha, Victoria Yaneva, Polina Harik, Ravi Pandian, Amy Morales, Brian Clauser

Proceedings of the 28th International Conference on Computational Linguistics

This paper brings together approaches from the fields of NLP and psychometric measurement to address the problem of predicting examinee proficiency from responses to short-answer questions (SAQs).

Category:Assessment-Oriented Research, Scoring

Leveraging Natural Language Processing: Toward Computer-Assisted Scoring of Patient Notes in the USMLE Step 2 Clinical Skills Exam

Posted: March 1, 2019 | J. Salt, P. Harik, M. A. Barone

Academic Medicine: March 2019 - Volume 94 - Issue 3 - p 314-316

The United States Medical Licensing Examination Step 2 Clinical Skills (CS) exam uses physician raters to evaluate patient notes written by examinees. In this Invited Commentary, the authors describe the ways in which the Step 2 CS exam could benefit from adopting a computer-assisted scoring approach that combines physician raters’ judgments with computer-generated scores based on natural language processing (NLP).

Category:Assessment-Oriented Research, Scoring, Applications of Technology, Product-Oriented Research, USMLE

A Comparison of Experimental and Observational Approaches to Assessing the Effects of Time Constraints in a Medical Licensing Examination

Posted: June 1, 2018 | P. Harik, B. E. Clauser, I. Grabovsky, P. Baldwin, M. Margolis, D. Bucak, M. Jodoin, W. Walsh, S. Haist

Journal of Educational Measurement: Volume 55, Issue 2, Pages 308-327

The widespread move to computerized test delivery has led to the development of new approaches to evaluating how examinees use testing time and to new metrics designed to provide evidence about the extent to which time limits impact performance. Much of the existing research is based on these types of observational metrics; relatively few studies use randomized experiments to evaluate the impact time limits on scores. Of those studies that do report on randomized experiments, none directly compare the experimental results to evidence from observational metrics to evaluate the extent to which these metrics are able to sensitively identify conditions in which time constraints actually impact scores. The present study provides such evidence based on data from a medical licensing examination.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring, Product-Oriented Research, USMLE

Exploring the Relationships Between USMLE Performance and Disciplinary Action in Practice: A Validity Study of Score Inferences From a Licensure Examination

Posted: December 1, 2017 | Monica M. Cuddy, Aaron Young, Andrew Gelman, David B. Swanson, David A. Johnson, Gerard F. Dillon, Brian E. Clauser

The authors examined the extent to which USMLE scores relate to the odds of receiving a disciplinary action from a U.S. state medical board.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

NBME Self-Assessment Bundles

Stay Up to Date

Stay Up to Date

New Psychometric Workshops

INSIGHTS® Demo

Open Grant Opportunities

RESEARCH LIBRARY

Filter:

Leveraging Machine Learning Technology to Improve Accuracy and Efficiency of Identification of Enemy Item Pairs

Using Eye-Tracking Data as Part of the Validity Argument for Multiple-Choice Questions

Exploring the Association Between USMLE Scores and ACGME Milestone Ratings: A Validity Study Using National Data From Emergency Medicine

Multiple United States Medical Licensing Examination Attempts and the Estimated Risk of Disciplinary Actions Among Graduates of U.S. and Canadian Medical Schools

Does Delaying the United States Medical Licensing Examination Step 1 to After Clerkships Affect Student Performance On Clerkship Subject Examinations?

Automated Prediction Of Examinee Proficiency From Short-Answer Questions

Leveraging Natural Language Processing: Toward Computer-Assisted Scoring of Patient Notes in the USMLE Step 2 Clinical Skills Exam

A Comparison of Experimental and Observational Approaches to Assessing the Effects of Time Constraints in a Medical Licensing Examination

Exploring the Relationships Between USMLE Performance and Disciplinary Action in Practice: A Validity Study of Score Inferences From a Licensure Examination