
RESEARCH LIBRARY
RESEARCH LIBRARY
View the latest publications from members of the NBME research team
Advancing Natural Language Processing in Educational Assessment
This book examines the use of natural language technology in educational testing, measurement, and assessment. Recent developments in natural language processing (NLP) have enabled large-scale educational applications, though scholars and professionals may lack a shared understanding of the strengths and limitations of NLP in assessment as well as the challenges that testing organizations face in implementation. This first-of-its-kind book provides evidence-based practices for the use of NLP-based approaches to automated text and speech scoring, language proficiency assessment, technology-assisted item generation, gamification, learner feedback, and beyond.
Medical Teacher: Volume 45 - Issue 6, Pages 565-573
This guide aims aim to describe practical considerations involved in reading and conducting studies in medical education using Artificial Intelligence (AI), define basic terminology and identify which medical education problems and data are ideally-suited for using AI.
Journal of Applied Technology: Volume 23 - Special Issue 1 - Pages 30-40
The interpretations of test scores in secure, high-stakes environments are dependent on several assumptions, one of which is that examinee responses to items are independent and no enemy items are included on the same forms. This paper documents the development and implementation of a C#-based application that uses Natural Language Processing (NLP) and Machine Learning (ML) techniques to produce prioritized predictions of item enemy statuses within a large item bank.
Journal of Educational Measurement: Volume 58, Issue 4, Pages 515-537
In this paper, the NBME team reports the results an eye-tracking study designed to evaluate how the presence of the options in multiple-choice questions impacts the way medical students responded to questions designed to evaluate clinical reasoning. Examples of the types of data that can be extracted are presented. We then discuss the implications of these results for evaluating the validity of inferences made based on the type of items used in this study.