RESEARCH LIBRARY

View the latest publications from members of the NBME research team

Showing 1 - 8 of 8 Research Library Publications

Uncovering the Complexity of Item Position Effects in a Low-Stakes Testing Context

Posted: July 4, 2022 | Thai Q. Ong, Dena A. Pastor

Applied Psychological Measurement: Volume 46, issue 2, page(s) 571-588

This study evaluates the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item.

Category:Assessment-Oriented Research, Reliability/Validity

Application of Sampling Variance of Item Response Theory Parameter Estimates in Detecting Outliers in Common Item Equating

Posted: June 14, 2022 | Chunyan Liu, Daniel Jurich

Applied Psychological Measurement: Volume 46, issue 6, page(s) 529-547

The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria.

Category:Assessment-Oriented Research, General Measurement

Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing

Posted: May 11, 2022 | Peter Baldwin, Brian E. Clauser

Journal of Educational Measurement: Volume 59, Issue 2, Pages 140-160

A conceptual framework for thinking about the problem of score comparability is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.

Category:Assessment-Oriented Research, Scoring

A Problem with the Bookmark Procedure's Correction for Guessing

Posted: November 24, 2020 | Peter Baldwin

Educational Measurement: Issues and Practice

This article aims to answer the question: when the assumption that examinees may apply themselves fully yet still respond incorrectly is violated, what are the consequences of using the modified model proposed by Lewis and his colleagues?

Category:Assessment-Oriented Research, General Measurement

Reporting Subscore Profiles Using Diagnostic Classification Models in Health Professions Education

Posted: September 1, 2020 | Y.S. Park, A. Morales, L. Ross, M. Paniagua

Evaluation & the Health Professions: Volume: 43 issue: 3, page(s): 149-158

This study examines the innovative and practical application of DCM framework to health professions educational assessments using retrospective large-scale assessment data from the basic and clinical sciences: National Board of Medical Examiners Subject Examinations in pathology (n = 2,006) and medicine (n = 2,351).

Category:Assessment-Oriented Research, Scoring, Product-Oriented Research, NBME

Examining the Precision of Cut Scores Within a Generalizability Theory Framework: A Closer Look at the Item Effect

Posted: June 3, 2020 | B. E. Clauser, M. Kane, J. C. Clauser

Journal of Educational Measurement: Volume 57, Issue 2, Pages 216-229

This article presents two generalizability-theory–based analyses of the proportion of the item variance that contributes to error in the cut score. For one approach, variance components are estimated on the probability (or proportion-correct) scale of the Angoff judgments, and for the other, the judgments are transferred to the theta scale of an item response theory model before estimating the variance components.

Category:Assessment-Oriented Research, Reliability/Validity

The Effects of Vignette Scoring on Reliability and Validity of Self-Reports

Posted: June 1, 2018 | M. von Davier, J. H. Shin, L. Khorramdel, L. Stankov

Applied Psychological Measurement: Volume: 42 issue: 4, page(s): 291-306

The research presented in this article combines mathematical derivations and empirical results to investigate effects of the nonparametric anchoring vignette approach proposed by King, Murray, Salomon, and Tandon on the reliability and validity of rating data. The anchoring vignette approach aims to correct rating data for response styles to improve comparability across individuals and groups.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring

The Use of Multivariate Generalizability Theory to Evaluate the Quality of Subscores

Posted: April 3, 2018 | Z. Jiang, M.R. Raymond

Applied Psychological Measurement: Volume: 42 issue: 8, page(s): 595-612

Conventional methods for evaluating the utility of subscores rely on reliability and correlation coefficients. However, correlations can overlook a notable source of variability: variation in subtest means/difficulties. Brennan introduced a reliability index for score profiles based on multivariate generalizability theory, designated as G, which is sensitive to variation in subtest difficulty. However, there has been little, if any, research evaluating the properties of this index. A series of simulation experiments, as well as analyses of real data, were conducted to investigate G under various conditions of subtest reliability, subtest correlations, and variability in subtest means.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring

Stay Up to Date

USMLE® Fee Assistance

Communication Learning Assessment

Introduction to Measurement Concepts: Validity and Reliability

NBME Academy

Latin America Grants