
RESEARCH LIBRARY
RESEARCH LIBRARY
View the latest publications from members of the NBME research team
Advancing Natural Language Processing in Educational Assessment
This book examines the use of natural language technology in educational testing, measurement, and assessment. Recent developments in natural language processing (NLP) have enabled large-scale educational applications, though scholars and professionals may lack a shared understanding of the strengths and limitations of NLP in assessment as well as the challenges that testing organizations face in implementation. This first-of-its-kind book provides evidence-based practices for the use of NLP-based approaches to automated text and speech scoring, language proficiency assessment, technology-assisted item generation, gamification, learner feedback, and beyond.
Journal of Educational Measurement: Volume 58, Issue 4, Pages 515-537
In this paper, the NBME team reports the results an eye-tracking study designed to evaluate how the presence of the options in multiple-choice questions impacts the way medical students responded to questions designed to evaluate clinical reasoning. Examples of the types of data that can be extracted are presented. We then discuss the implications of these results for evaluating the validity of inferences made based on the type of items used in this study.
Educational Measurement: Issues and Practice
This short, invited manuscript focuses on the implications for certification and licensure assessment organizations as a result of the wide‐spread disruptions caused by the COVID-19 pandemic.
Integrating Timing Considerations to Improve Testing Practices
This book synthesizes a wealth of theory and research on time issues in assessment into actionable advice for test development, administration, and scoring.
Integrating Timing Considerations to Improve Testing Practices
This chapter presents a historical overview of the testing literature that exemplifies the theoretical and operational evolution of test speededness.
Educational Measurement: Issues and Practice, 39: 37-44
This article presents the results of an experiment in which content experts were randomly assigned to one of two response probability conditions: .67 and .80. If the standard-setting judgments collected with the bookmark procedure are internally consistent, both conditions should produce highly similar cut scores.