library bookshelves


View the latest publications from members of the NBME research team

Showing 1 - 10 of 25 Research Library Publications
Posted: | Daniel Jurich, Chunyan Liu

Applied Measurement Education: Volume 36, Issue 4, Pages 326-339


This study examines strategies for detecting parameter drift in small-sample equating, crucial for maintaining score comparability in high-stakes exams. Results suggest that methods like mINFIT, mOUTFIT, and Robust-z effectively mitigate drifting anchor items' effects, while caution is advised with the Logit Difference approach. Recommendations are provided for practitioners to manage item parameter drift in small-sample settings.

Posted: | Daniel P. Jurich, Matthew J. Madison

Educational Assessment


This study proposes four indices to quantify item influence and distinguishes them from other available item and test measures. We use simulation methods to evaluate and provide guidelines for interpreting each index, followed by a real data application to illustrate their use in practice. We discuss theoretical considerations regarding when influence presents a psychometric concern and other practical concerns such as how the indices function when reducing influence imbalance.

Posted: | King Yiu Suen, Victoria Yaneva, Le An Ha, Janet Mee, Yiyun Zhou, Polina Harik

Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), Pages 443-447


This paper presents the ACTA system, which performs automated short-answer grading in the domain of high-stakes medical exams. The system builds upon previous work on neural similarity-based grading approaches by applying these to the medical domain and utilizing contrastive learning as a means to optimize the similarity metric. 

Posted: | Victoria Yaneva, Peter Baldwin, Le An Ha, Christopher Runyon

Advancing Natural Language Processing in Educational Assessment: Pages 167-182


This chapter discusses the evolution of natural language processing (NLP) approaches to text representation and how different ways of representing text can be utilized for a relatively understudied task in educational assessment – that of predicting item characteristics from item text.

Posted: | Polina Harik, Janet Mee, Christopher Runyon, Brian E. Clauser

Advancing Natural Language Processing in Educational Assessment: Pages 58-73


This chapter describes INCITE, an NLP-based system for scoring free-text responses. It emphasizes the importance of context and the system’s intended use and explains how each component of the system contributed to its accuracy.

Posted: | Matthias von Davier, Brian Clauser

Essays on Contemporary Psychometrics: Pages 163-180


This paper shows that using non-linear functions for equating and score transformations leads to consequences that are not commensurable with classical test theory (CTT). More specifically, a well-known theorem from calculus shows that the expected value of a non-linearly transformed variable does not equal the transformed expected value of this variable.

Posted: | Chunyan Liu, Dan Jurich

Applied Psychological Measurement: Volume 47, issue 1, page(s) 34-47


This study used simulation to investigate the performance of the t-test method in detecting outliers and compared its performance with other outlier detection methods, including the logit difference method with 0.5 and 0.3 as the cutoff values and the robust z statistic with 2.7 as the cutoff value.

Posted: | Peter Baldwin, Brian E. Clauser

Journal of Educational Measurement: Volume 59, Issue 2, Pages 140-160


A conceptual framework for thinking about the problem of score comparability is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.

Posted: | Ian Micir, Kimberly Swygert, Jean D'Angelo

Journal of Applied Technology: Volume 23 - Special Issue 1 - Pages 30-40


The interpretations of test scores in secure, high-stakes environments are dependent on several assumptions, one of which is that examinee responses to items are independent and no enemy items are included on the same forms. This paper documents the development and implementation of a C#-based application that uses Natural Language Processing (NLP) and Machine Learning (ML) techniques to produce prioritized predictions of item enemy statuses within a large item bank.

Posted: | Le An Ha, Victoria Yaneva, Polina Harik, Ravi Pandian, Amy Morales, Brian Clauser

Proceedings of the 28th International Conference on Computational Linguistics


This paper brings together approaches from the fields of NLP and psychometric measurement to address the problem of predicting examinee proficiency from responses to short-answer questions (SAQs).