RESEARCH LIBRARY

View the latest publications from members of the NBME research team

Showing 1 - 7 of 7 Research Library Publications

Three Sources of Validation Evidence Needed to Evaluate the Quality of Generated Test Items for Medical Licensure

Posted: August 21, 2022 | Mark Gierl, Kimberly Swygert, Donna Matovinovic, Allison Kulesher, Hollis Lai

Teaching and Learning in Medicine: Volume 33 - Issue 4 - p 366-381

The purpose of this analysis is to describe these sources of evidence that can be used to evaluate the quality of generated items. The important role of medical expertise in the development and evaluation of the generated items is highlighted as a crucial requirement for producing validation evidence.

Category:Assessment-Oriented Research, Other

A Problem with the Bookmark Procedure's Correction for Guessing

Posted: November 24, 2020 | Peter Baldwin

Educational Measurement: Issues and Practice

This article aims to answer the question: when the assumption that examinees may apply themselves fully yet still respond incorrectly is violated, what are the consequences of using the modified model proposed by Lewis and his colleagues?

Category:Assessment-Oriented Research, General Measurement

How Examinees Use Time

Posted: June 25, 2020 | P. Harik, R.A. Feinberg RA, B.E. Clauser

Integrating Timing Considerations to Improve Testing Practices

This chapter addresses a different aspect of the use of timing data: it provides a framework for understanding how an examinee's use of time interfaces with time limits to impact both test performance and the validity of inferences made based on test scores. It focuses primarily on examinations that are administered as part of the physician licensure process.

Category:Assessment-Oriented Research, General Measurement, Reliability/Validity

Examining the Precision of Cut Scores Within a Generalizability Theory Framework: A Closer Look at the Item Effect

Posted: June 3, 2020 | B. E. Clauser, M. Kane, J. C. Clauser

Journal of Educational Measurement: Volume 57, Issue 2, Pages 216-229

This article presents two generalizability-theory–based analyses of the proportion of the item variance that contributes to error in the cut score. For one approach, variance components are estimated on the probability (or proportion-correct) scale of the Angoff judgments, and for the other, the judgments are transferred to the theta scale of an item response theory model before estimating the variance components.

Category:Assessment-Oriented Research, Reliability/Validity

Adding Objectivity to Standard Setting: Evaluating Consequence Using the Conscious and Subconscious Weight Methods

Posted: February 26, 2020 | B.C. Leventhal, I. Grabovsky

Educational Measurement: Issues and Practice, 39: 30-36

This article proposes the conscious weight method and subconscious weight method to bring more objectivity to the standard setting process. To do this, these methods quantify the relative harm of the negative consequences of false positive and false negative misclassification.

Category:Assessment-Oriented Research, General Measurement

The Choice of Response Probability in Bookmark Standard Setting: An Experimental Study

Posted: January 16, 2019 | P. Baldwin, M.J. Margolis, B.E. Clauser, J. Mee, M. Winward

Educational Measurement: Issues and Practice, 39: 37-44

This article presents the results of an experiment in which content experts were randomly assigned to one of two response probability conditions: .67 and .80. If the standard-setting judgments collected with the bookmark procedure are internally consistent, both conditions should produce highly similar cut scores.

Category:Assessment-Oriented Research, General Measurement

The Optimal Number of Options for Multiple-Choice Questions on High-Stakes Tests: Application of a Revised Index for Detecting Nonfunctional Distractors

Posted: October 25, 2018 | M.R. Raymond, C. Stevens, S.D. Bucak

Adv in Health Sci Educ 24, 141–150 (2019)

Research suggests that the three-option format is optimal for multiple choice questions (MCQs). This conclusion is supported by numerous studies showing that most distractors (i.e., incorrect answers) are selected by so few examinees that they are essentially nonfunctional. However, nearly all studies have defined a distractor as nonfunctional if it is selected by fewer than 5% of examinees.

Category:Assessment-Oriented Research, General Measurement

Stay Up to Date

USMLE® Fee Assistance

Communication Learning Assessment

Introduction to Measurement Concepts: Validity and Reliability

NBME Academy

Latin America Grants

USMLE® Fee Assistance

RESEARCH LIBRARY

Filter:

Three Sources of Validation Evidence Needed to Evaluate the Quality of Generated Test Items for Medical Licensure

A Problem with the Bookmark Procedure's Correction for Guessing

How Examinees Use Time

Examining the Precision of Cut Scores Within a Generalizability Theory Framework: A Closer Look at the Item Effect

Adding Objectivity to Standard Setting: Evaluating Consequence Using the Conscious and Subconscious Weight Methods

The Choice of Response Probability in Bookmark Standard Setting: An Experimental Study

The Optimal Number of Options for Multiple-Choice Questions on High-Stakes Tests: Application of a Revised Index for Detecting Nonfunctional Distractors