Research Library

Showing 1 - 10 of 10 Research Library Publications

Using Eye-Tracking Data as Part of the Validity Argument for Multiple-Choice Questions

Posted: December 4, 2021 | Victoria Yaneva, Brian E. Clauser, Amy Morales, Miguel Paniagua

Journal of Educational Measurement: Volume 58, Issue 4, Pages 515-537

In this paper, the NBME team reports the results an eye-tracking study designed to evaluate how the presence of the options in multiple-choice questions impacts the way medical students responded to questions designed to evaluate clinical reasoning. Examples of the types of data that can be extracted are presented. We then discuss the implications of these results for evaluating the validity of inferences made based on the type of items used in this study.

Category:Assessment-Oriented Research, Applications of Technology

Exploring the Association Between USMLE Scores and ACGME Milestone Ratings: A Validity Study Using National Data From Emergency Medicine

Posted: September 1, 2021 | Stanley J. Hamstra, Monica M. Cuddy, Daniel Jurich, Kenji Yamazaki, John Burkhardt, Eric S. Holmboe, Michael A. Barone, Sally A. Santen

Academic Medicine: Volume 96 - Issue 9 - Pages 1324-1331

This study examines associations between USMLE Step 1 and Step 2 Clinical Knowledge (CK) scores and ACGME emergency medicine (EM) milestone ratings.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

Multiple United States Medical Licensing Examination Attempts and the Estimated Risk of Disciplinary Actions Among Graduates of U.S. and Canadian Medical Schools

Posted: September 1, 2021 | Katie L. Arnhart, Monica M. Cuddy, David Johnson, Michael A. Barone, Aaron Young

Academic Medicine: Volume 96 - Issue 9 - Pages 1319-1323

This study examined the relationship between USMLE attempts and the likelihood of receiving disciplinary actions from state medical boards.

Category:Product-Oriented Research, USMLE, Assessment-Oriented Research, Reliability/Validity, Links to Outcomes

Gender Comparison in Milestone Trajectories and Medical Knowledge Examination Scores among Internal Medicine Residents

Posted: May 25, 2021 | Karen E. Hauer, Daniel Jurich, Jonathan Vandergrift, Rebecca S. Lipner, Furman S. McDonald, Kenji Yamazaki, Davoren Chick, Kevin McAllister, Eric S. Holmboe

Academic Medicine: Volume 96 - Issue 6 - p 876-884(9)

This study examines whether there are group differences in milestone ratings submitted by program directors working with clinical competency committees based on gender for internal medicine residents and whether women and men rated similarly on subsequent in-training and certification examinations.

Category:Assessment-Oriented Research, General Measurement

Handbook of Diagnostic Classification Models

Posted: August 31, 2019 | M. von Davier, YS. Lee

Springer International Publishing; 2019

This handbook provides an overview of major developments around diagnostic classification models (DCMs) with regard to modeling, estimation, model checking, scoring, and applications. It brings together not only the current state of the art, but also the theoretical background and models developed for diagnostic classification.

Category:Assessment-Oriented Research, General Measurement, Scoring

Visualizing Hierarchical Score Inferences

Posted: June 6, 2019 | R.A. Feinberg, D.P Jurich

On the Cover. Educational Measurement: Issues and Practice, 38: 5-5

This informative graphic reports between‐individual information where a vertical line—with dashed lines on either side indicating an error band—spans three graphics allowing a student to easily see their score relative to four defined performance categories and, more notably, three relevant score distributions.

Category:Assessment-Oriented Research, Scoring

One Size Doesn’t Fit All: Using Factor Analysis to Gather Validity Evidence When Using Surveys in Your Research

Posted: March 1, 2019 | E. Knetka, C. Runyon, S. Eddy

CBE—Life Sciences Education Vol. 18, No. 1

This article briefly reviews the aspects of validity that researchers should consider when using surveys. It then focuses on factor analysis, a statistical method that can be used to collect an important type of validity evidence.

Category:Assessment-Oriented Research, Reliability/Validity

Leveraging Natural Language Processing: Toward Computer-Assisted Scoring of Patient Notes in the USMLE Step 2 Clinical Skills Exam

Posted: March 1, 2019 | J. Salt, P. Harik, M. A. Barone

Academic Medicine: March 2019 - Volume 94 - Issue 3 - p 314-316

The United States Medical Licensing Examination Step 2 Clinical Skills (CS) exam uses physician raters to evaluate patient notes written by examinees. In this Invited Commentary, the authors describe the ways in which the Step 2 CS exam could benefit from adopting a computer-assisted scoring approach that combines physician raters’ judgments with computer-generated scores based on natural language processing (NLP).

Category:Assessment-Oriented Research, Scoring, Applications of Technology, Product-Oriented Research, USMLE

The Choice of Response Probability in Bookmark Standard Setting: An Experimental Study

Posted: January 16, 2019 | P. Baldwin, M.J. Margolis, B.E. Clauser, J. Mee, M. Winward

Educational Measurement: Issues and Practice, 39: 37-44

This article presents the results of an experiment in which content experts were randomly assigned to one of two response probability conditions: .67 and .80. If the standard-setting judgments collected with the bookmark procedure are internally consistent, both conditions should produce highly similar cut scores.

Category:Assessment-Oriented Research, General Measurement

Effects of Discontinue Rules on Psychometric Properties of Test Scores

Posted: January 3, 2019 | M. von Davier, Y. Cho, T. Pan

Psychometrika 84, 147–163 (2019)

This paper provides results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty. The presentation of items is adaptive in the sense that a session is discontinued once a test taker produces a certain number of incorrect responses in sequence, with subsequent (not observed) responses commonly scored as wrong.

Category:Assessment-Oriented Research, Scoring

RESEARCH LIBRARY