RESEARCH LIBRARY

View the latest publications from members of the NBME research team

Showing 1 - 10 of 22 Research Library Publications

Detecting Item Parameter Drift in Small Sample Rasch Equating

Posted: November 8, 2023 | Daniel Jurich, Chunyan Liu

Applied Measurement Education: Volume 36, Issue 4, Pages 326-339

This study examines strategies for detecting parameter drift in small-sample equating, crucial for maintaining score comparability in high-stakes exams. Results suggest that methods like mINFIT, mOUTFIT, and Robust-z effectively mitigate drifting anchor items' effects, while caution is advised with the Logit Difference approach. Recommendations are provided for practitioners to manage item parameter drift in small-sample settings.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring

Uncovering the Complexity of Item Position Effects in a Low-Stakes Testing Context

Posted: July 4, 2022 | Thai Q. Ong, Dena A. Pastor

Applied Psychological Measurement: Volume 46, issue 2, page(s) 571-588

This study evaluates the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item.

Category:Assessment-Oriented Research, Reliability/Validity

An Examination of the Associations Among USMLE Step 3 Scores and Likelihood of Disciplinary Action in Practice

Posted: June 7, 2022 | Monica M. Cuddy, Chunyan Liu, Wenli Ouyang, Michael A. Barone, Aaron Young, David A. Johnson

Academic Medicine: June 2022

This study examines the associations between Step 3 scores and subsequent receipt of disciplinary action taken by state medical boards for problematic behavior in practice. It analyzes Step 3 total, Step 3 computer-based case simulation (CCS), and Step 3multiple-choice question (MCQ) scores.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

In Reply to D'Eon and Kleinheksel

Posted: April 1, 2022 | Katie L. Arnhart, Monica M. Cuddy, David Johnson, Michael A. Barone, Aaron Young

Academic Medicine: Volume 97 - Issue 4 - Pages 476-477

Response to to emphasize that although findings support a relationship between multiple USMLE attempts and increased likelihood of receiving disciplinary actions, the findings in isolation are not sufficient for proposing new policy on how many attempts should be allowed.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

Digital Module 28: Unusual Things That Usually Occur in a Credentialing Testing Program

Posted: March 17, 2022 | Richard A. Feinberg, Carol Morrison, Mark R. Raymond

Educational Measurement: Issues and Practices: Volume 41 - Issue 1 - Pages 95-96

Often unanticipated situations arise that can create a range of problems from threats to score validity, to unexpected financial costs, and even longer-term reputational damage. This module discusses some of these unusual challenges that usually occur in a credentialing program.

Category:Assessment-Oriented Research, General Measurement, Reliability/Validity

Exploring the Association Between USMLE Scores and ACGME Milestone Ratings: A Validity Study Using National Data From Emergency Medicine

Posted: September 1, 2021 | Stanley J. Hamstra, Monica M. Cuddy, Daniel Jurich, Kenji Yamazaki, John Burkhardt, Eric S. Holmboe, Michael A. Barone, Sally A. Santen

Academic Medicine: Volume 96 - Issue 9 - Pages 1324-1331

This study examines associations between USMLE Step 1 and Step 2 Clinical Knowledge (CK) scores and ACGME emergency medicine (EM) milestone ratings.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

Multiple United States Medical Licensing Examination Attempts and the Estimated Risk of Disciplinary Actions Among Graduates of U.S. and Canadian Medical Schools

Posted: September 1, 2021 | Katie L. Arnhart, Monica M. Cuddy, David Johnson, Michael A. Barone, Aaron Young

Academic Medicine: Volume 96 - Issue 9 - Pages 1319-1323

This study examined the relationship between USMLE attempts and the likelihood of receiving disciplinary actions from state medical boards.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

How Examinees Use Time

Posted: June 25, 2020 | P. Harik, R.A. Feinberg RA, B.E. Clauser

Integrating Timing Considerations to Improve Testing Practices

This chapter addresses a different aspect of the use of timing data: it provides a framework for understanding how an examinee's use of time interfaces with time limits to impact both test performance and the validity of inferences made based on test scores. It focuses primarily on examinations that are administered as part of the physician licensure process.

Category:Assessment-Oriented Research, General Measurement, Reliability/Validity

A History of Test Speededness: Tracing the Evolution of Theory and Practice

Posted: June 25, 2020 | D. Jurich

Integrating Timing Considerations to Improve Testing Practices

This chapter presents a historical overview of the testing literature that exemplifies the theoretical and operational evolution of test speededness.

Category:Assessment-Oriented Research, General Measurement, Reliability/Validity

Examining the Precision of Cut Scores Within a Generalizability Theory Framework: A Closer Look at the Item Effect

Posted: June 3, 2020 | B. E. Clauser, M. Kane, J. C. Clauser

Journal of Educational Measurement: Volume 57, Issue 2, Pages 216-229

This article presents two generalizability-theory–based analyses of the proportion of the item variance that contributes to error in the cut score. For one approach, variance components are estimated on the probability (or proportion-correct) scale of the Angoff judgments, and for the other, the judgments are transferred to the theta scale of an item response theory model before estimating the variance components.

Category:Assessment-Oriented Research, Reliability/Validity

Self-Assessment Bundles Available

Stay Up to Date

Communication Learning Assessment

Introduction to Measurement Concepts: Validity and Reliability

NBME Academy

Latin America Grants

RESEARCH LIBRARY

Filter:

Detecting Item Parameter Drift in Small Sample Rasch Equating

Uncovering the Complexity of Item Position Effects in a Low-Stakes Testing Context

An Examination of the Associations Among USMLE Step 3 Scores and Likelihood of Disciplinary Action in Practice

In Reply to D'Eon and Kleinheksel

Digital Module 28: Unusual Things That Usually Occur in a Credentialing Testing Program

Exploring the Association Between USMLE Scores and ACGME Milestone Ratings: A Validity Study Using National Data From Emergency Medicine

Multiple United States Medical Licensing Examination Attempts and the Estimated Risk of Disciplinary Actions Among Graduates of U.S. and Canadian Medical Schools

How Examinees Use Time

A History of Test Speededness: Tracing the Evolution of Theory and Practice

Examining the Precision of Cut Scores Within a Generalizability Theory Framework: A Closer Look at the Item Effect