library bookshelves


View the latest publications from members of the NBME research team

Showing 21 - 30 of 111 Research Library Publications
Posted: August 21, 2022 | Mark Gierl, Kimberly Swygert, Donna Matovinovic, Allison Kulesher, Hollis Lai

Teaching and Learning in Medicine: Volume 33 - Issue 4 - p 366-381


The purpose of this analysis is to describe these sources of evidence that can be used to evaluate the quality of generated items. The important role of medical expertise in the development and evaluation of the generated items is highlighted as a crucial requirement for producing validation evidence.

Posted: July 21, 2022 | Jonathan D. Rubright, Thai Q. Ong, Michael G. Jodoin, David A. Johnson, Michael A. Barone

Academic Medicine: Volume 97 - Issue 8 - Pages 1219-1225


Since 2012, the United States Medical Licensing Examination (USMLE) has maintained a policy of ≤ 6 attempts on any examination component. The purpose of this study was to empirically examine the appropriateness of existing USMLE retake policy.

Posted: July 4, 2022 | Thai Q. Ong, Dena A. Pastor

Applied Psychological Measurement: Volume 46, issue 2, page(s) 571-588


This study evaluates the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item.

Posted: July 1, 2022 | Victoria Yaneva, Janet Mee, Le Ha, Polina Harik, Michael Jodoin, Alex Mechaber

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - p 2880–2886


This paper presents a corpus of 43,985 clinical patient notes (PNs) written by 35,156 examinees during the high-stakes USMLE® Step 2 Clinical Skills examination.

Posted: June 14, 2022 | Chunyan Liu, Daniel Jurich

Applied Psychological Measurement: Volume 46, issue 6, page(s) 529-547


The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria.

Posted: June 7, 2022 | Monica M. Cuddy, Chunyan Liu, Wenli Ouyang, Michael A. Barone, Aaron Young, David A. Johnson

Academic Medicine: June 2022


This study examines the associations between Step 3 scores and subsequent receipt of disciplinary action taken by state medical boards for problematic behavior in practice. It analyzes Step 3 total, Step 3 computer-based case simulation (CCS), and Step 3multiple-choice question (MCQ) scores.

Posted: May 31, 2022 | Daniel Jurich, Chunyan Liu, Amanda Clauser

Journal of Graduate Medical Education: Volume 14, Issue 3, Pages 353-354


Letter to the editor.

Posted: May 11, 2022 | Peter Baldwin, Brian E. Clauser

Journal of Educational Measurement: Volume 59, Issue 2, Pages 140-160


A conceptual framework for thinking about the problem of score comparability is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.

Posted: May 5, 2022 | Victoria Yaneva, Brian E. Clauser, Amy Morales, Miguel Paniagua

Advances in Health Sciences Education: Volume 27, p 1401–1422


After collecting eye-tracking data from 26 students responding to clinical MCQs, analysis is performed by providing 119 eye-tracking features as input for a machine-learning model aiming to classify correct and incorrect responses. The predictive power of various combinations of features within the model is evaluated to understand how different feature interactions contribute to the predictions.

Posted: April 29, 2022 | Andrew A. White, Ann M. King, Angelo E. D’Addario, Karen Berg Brigham, Suzanne Dintzis, Emily E. Fay, Thomas H. Gallagher, Kathleen M. Mazor

JMIR Medical Education: Volume 8 - Issue 2 - e30988


This article aims to compare the reliability of two assessment groups (crowdsourced laypeople and patient advocates) in rating physician error disclosure communication skills using the Video-Based Communication Assessment app.