RESEARCH LIBRARY

View the latest publications from members of the NBME research team

Showing 1 - 10 of 43 Research Library Publications

A Comparison of Remote vs In-Person Proctored In-Training Examination Administration for Internal Medicine

Posted: January 29, 2024 | Thai Ong, Becky Krumm, Margaret Wells, Susan Read, Linda Harris, Andrea Altomare, Miguel Paniagua

Academic Medicine: Volume 99 - Issue 7 - Pages 778-783

This study examined score comparability between in-person and remote proctored administrations of the 2020 Internal Medicine In-Training Examination (IM-ITE) during the COVID-19 pandemic. Analysis of data from 27,115 IM residents revealed statistically significant but educationally nonsignificant differences in predicted scores, with slightly larger variations observed for first-year residents. Overall, performance did not substantially differ between the two testing modalities, supporting the continued use of remote proctoring for the IM-ITE amidst pandemic-related disruptions.

Category:Assessment-Oriented Research, Scoring, Links to Outcomes

Detecting Item Parameter Drift in Small Sample Rasch Equating

Posted: November 8, 2023 | Daniel Jurich, Chunyan Liu

Applied Measurement Education: Volume 36, Issue 4, Pages 326-339

This study examines strategies for detecting parameter drift in small-sample equating, crucial for maintaining score comparability in high-stakes exams. Results suggest that methods like mINFIT, mOUTFIT, and Robust-z effectively mitigate drifting anchor items' effects, while caution is advised with the Logit Difference approach. Recommendations are provided for practitioners to manage item parameter drift in small-sample settings.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring

Measuring Item Influence for Diagnostic Classification Models

Posted: August 14, 2023 | Daniel P. Jurich, Matthew J. Madison

Educational Assessment

This study proposes four indices to quantify item influence and distinguishes them from other available item and test measures. We use simulation methods to evaluate and provide guidelines for interpreting each index, followed by a real data application to illustrate their use in practice. We discuss theoretical considerations regarding when influence presents a psychometric concern and other practical concerns such as how the indices function when reducing influence imbalance.

Category:Assessment-Oriented Research, Scoring

ACTA: Short-Answer Grading in High-Stakes Medical Exams

Posted: July 1, 2023 | King Yiu Suen, Victoria Yaneva, Le An Ha, Janet Mee, Yiyun Zhou, Polina Harik

Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), Pages 443-447

This paper presents the ACTA system, which performs automated short-answer grading in the domain of high-stakes medical exams. The system builds upon previous work on neural similarity-based grading approaches by applying these to the medical domain and utilizing contrastive learning as a means to optimize the similarity metric.

Category:Assessment-Oriented Research, Scoring, General Measurement

Extracting Linguistic Signal From Item Text and Its Application to Modeling Item Characteristics

Posted: June 5, 2023 | Victoria Yaneva, Peter Baldwin, Le An Ha, Christopher Runyon

Advancing Natural Language Processing in Educational Assessment: Pages 167-182

This chapter discusses the evolution of natural language processing (NLP) approaches to text representation and how different ways of representing text can be utilized for a relatively understudied task in educational assessment – that of predicting item characteristics from item text.

Category:Assessment-Oriented Research, Applications of Technology, Scoring

Assessment of Clinical Skills: A Case Study in Constructing an NLP-based Scoring System for Patient Notes

Posted: June 5, 2023 | Polina Harik, Janet Mee, Christopher Runyon, Brian E. Clauser

Advancing Natural Language Processing in Educational Assessment: Pages 58-73

This chapter describes INCITE, an NLP-based system for scoring free-text responses. It emphasizes the importance of context and the system’s intended use and explains how each component of the system contributed to its accuracy.

Category:Assessment-Oriented Research, Applications of Technology, Scoring

Quantifying the Bias of Non-linear Equating and Score Transformations

Posted: March 16, 2023 | Matthias von Davier, Brian Clauser

Essays on Contemporary Psychometrics: Pages 163-180

This paper shows that using non-linear functions for equating and score transformations leads to consequences that are not commensurable with classical test theory (CTT). More specifically, a well-known theorem from calculus shows that the expected value of a non-linearly transformed variable does not equal the transformed expected value of this variable.

Category:Assessment-Oriented Research, Scoring

Outlier Detection Using t-test in Rasch IRT Equating under NEAT Design

Posted: September 6, 2022 | Chunyan Liu, Dan Jurich

Applied Psychological Measurement: Volume 47, issue 1, page(s) 34-47

This study used simulation to investigate the performance of the t-test method in detecting outliers and compared its performance with other outlier detection methods, including the logit difference method with 0.5 and 0.3 as the cutoff values and the robust z statistic with 2.7 as the cutoff value.

Category:Assessment-Oriented Research, Scoring

Uncovering the Complexity of Item Position Effects in a Low-Stakes Testing Context

Posted: July 4, 2022 | Thai Q. Ong, Dena A. Pastor

Applied Psychological Measurement: Volume 46, issue 2, page(s) 571-588

This study evaluates the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item.

Category:Assessment-Oriented Research, Reliability/Validity

An Examination of the Associations Among USMLE Step 3 Scores and Likelihood of Disciplinary Action in Practice

Posted: June 7, 2022 | Monica M. Cuddy, Chunyan Liu, Wenli Ouyang, Michael A. Barone, Aaron Young, David A. Johnson

Academic Medicine: June 2022

This study examines the associations between Step 3 scores and subsequent receipt of disciplinary action taken by state medical boards for problematic behavior in practice. It analyzes Step 3 total, Step 3 computer-based case simulation (CCS), and Step 3multiple-choice question (MCQ) scores.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

Stay Up to Date

USMLE® Fee Assistance

Communication Learning Assessment

Introduction to Measurement Concepts: Validity and Reliability

NBME Academy

Latin America Grants

USMLE® Fee Assistance

RESEARCH LIBRARY

Filter:

A Comparison of Remote vs In-Person Proctored In-Training Examination Administration for Internal Medicine

Detecting Item Parameter Drift in Small Sample Rasch Equating

Measuring Item Influence for Diagnostic Classification Models

ACTA: Short-Answer Grading in High-Stakes Medical Exams

Extracting Linguistic Signal From Item Text and Its Application to Modeling Item Characteristics

Assessment of Clinical Skills: A Case Study in Constructing an NLP-based Scoring System for Patient Notes

Quantifying the Bias of Non-linear Equating and Score Transformations

Outlier Detection Using t-test in Rasch IRT Equating under NEAT Design

Uncovering the Complexity of Item Position Effects in a Low-Stakes Testing Context

An Examination of the Associations Among USMLE Step 3 Scores and Likelihood of Disciplinary Action in Practice