RESEARCH LIBRARY

View the latest publications from members of the NBME research team

Showing 1 - 10 of 44 Research Library Publications

Detecting Item Parameter Drift in Small Sample Rasch Equating

Posted: November 8, 2023 | Daniel Jurich, Chunyan Liu

Applied Measurement Education: Volume 36, Issue 4, Pages 326-339

This study examines strategies for detecting parameter drift in small-sample equating, crucial for maintaining score comparability in high-stakes exams. Results suggest that methods like mINFIT, mOUTFIT, and Robust-z effectively mitigate drifting anchor items' effects, while caution is advised with the Logit Difference approach. Recommendations are provided for practitioners to manage item parameter drift in small-sample settings.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring

An Experimental Comparison of Multiple-Choice and Short-Answer Questions on a High-Stakes Test for Medical Students

Posted: September 4, 2023 | Janet Mee, Ravi Pandian, Justin Wolczynski, Amy Morales, Miguel Paniagua, Polina Harik, Peter Baldwin, Brian E. Clauser

Advances in Health Sciences Education

Recent advancements enable replacing MCQs with SAQs in high-stakes assessments, but prior research often used small samples under low stakes and lacked time data. This study assesses difficulty, discrimination, and time in a large-scale high-stakes context

Category:Assessment-Oriented Research, Links to Outcomes, General Measurement

ACTA: Short-Answer Grading in High-Stakes Medical Exams

Posted: July 1, 2023 | King Yiu Suen, Victoria Yaneva, Le An Ha, Janet Mee, Yiyun Zhou, Polina Harik

Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), Pages 443-447

This paper presents the ACTA system, which performs automated short-answer grading in the domain of high-stakes medical exams. The system builds upon previous work on neural similarity-based grading approaches by applying these to the medical domain and utilizing contrastive learning as a means to optimize the similarity metric.

Category:Assessment-Oriented Research, Scoring, General Measurement

Advancing Natural Language Processing in Educational Assessment

Posted: June 5, 2023 | Victoria Yaneva (editor), Matthias von Davier (editor)

Advancing Natural Language Processing in Educational Assessment

This book examines the use of natural language technology in educational testing, measurement, and assessment. Recent developments in natural language processing (NLP) have enabled large-scale educational applications, though scholars and professionals may lack a shared understanding of the strengths and limitations of NLP in assessment as well as the challenges that testing organizations face in implementation. This first-of-its-kind book provides evidence-based practices for the use of NLP-based approaches to automated text and speech scoring, language proficiency assessment, technology-assisted item generation, gamification, learner feedback, and beyond.

Category:Assessment-Oriented Research, Applications of Technology, General Measurement

Uncovering the Complexity of Item Position Effects in a Low-Stakes Testing Context

Posted: July 4, 2022 | Thai Q. Ong, Dena A. Pastor

Applied Psychological Measurement: Volume 46, issue 2, page(s) 571-588

This study evaluates the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item.

Category:Assessment-Oriented Research, Reliability/Validity

Application of Sampling Variance of Item Response Theory Parameter Estimates in Detecting Outliers in Common Item Equating

Posted: June 14, 2022 | Chunyan Liu, Daniel Jurich

Applied Psychological Measurement: Volume 46, issue 6, page(s) 529-547

The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria.

Category:Assessment-Oriented Research, General Measurement

An Examination of the Associations Among USMLE Step 3 Scores and Likelihood of Disciplinary Action in Practice

Posted: June 7, 2022 | Monica M. Cuddy, Chunyan Liu, Wenli Ouyang, Michael A. Barone, Aaron Young, David A. Johnson

Academic Medicine: June 2022

This study examines the associations between Step 3 scores and subsequent receipt of disciplinary action taken by state medical boards for problematic behavior in practice. It analyzes Step 3 total, Step 3 computer-based case simulation (CCS), and Step 3multiple-choice question (MCQ) scores.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

In Reply to D'Eon and Kleinheksel

Posted: April 1, 2022 | Katie L. Arnhart, Monica M. Cuddy, David Johnson, Michael A. Barone, Aaron Young

Academic Medicine: Volume 97 - Issue 4 - Pages 476-477

Response to to emphasize that although findings support a relationship between multiple USMLE attempts and increased likelihood of receiving disciplinary actions, the findings in isolation are not sufficient for proposing new policy on how many attempts should be allowed.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

Digital Module 28: Unusual Things That Usually Occur in a Credentialing Testing Program

Posted: March 17, 2022 | Richard A. Feinberg, Carol Morrison, Mark R. Raymond

Educational Measurement: Issues and Practices: Volume 41 - Issue 1 - Pages 95-96

Often unanticipated situations arise that can create a range of problems from threats to score validity, to unexpected financial costs, and even longer-term reputational damage. This module discusses some of these unusual challenges that usually occur in a credentialing program.

Category:Assessment-Oriented Research, General Measurement, Reliability/Validity

Exploring the Association Between USMLE Scores and ACGME Milestone Ratings: A Validity Study Using National Data From Emergency Medicine

Posted: September 1, 2021 | Stanley J. Hamstra, Monica M. Cuddy, Daniel Jurich, Kenji Yamazaki, John Burkhardt, Eric S. Holmboe, Michael A. Barone, Sally A. Santen

Academic Medicine: Volume 96 - Issue 9 - Pages 1324-1331

This study examines associations between USMLE Step 1 and Step 2 Clinical Knowledge (CK) scores and ACGME emergency medicine (EM) milestone ratings.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

Stay Up to Date

USMLE® Fee Assistance

Communication Learning Assessment

Introduction to Measurement Concepts: Validity and Reliability

NBME Academy

Latin America Grants

USMLE® Fee Assistance

RESEARCH LIBRARY

Filter:

Detecting Item Parameter Drift in Small Sample Rasch Equating

An Experimental Comparison of Multiple-Choice and Short-Answer Questions on a High-Stakes Test for Medical Students

ACTA: Short-Answer Grading in High-Stakes Medical Exams

Advancing Natural Language Processing in Educational Assessment

Uncovering the Complexity of Item Position Effects in a Low-Stakes Testing Context

Application of Sampling Variance of Item Response Theory Parameter Estimates in Detecting Outliers in Common Item Equating

An Examination of the Associations Among USMLE Step 3 Scores and Likelihood of Disciplinary Action in Practice

In Reply to D'Eon and Kleinheksel

Digital Module 28: Unusual Things That Usually Occur in a Credentialing Testing Program

Exploring the Association Between USMLE Scores and ACGME Milestone Ratings: A Validity Study Using National Data From Emergency Medicine