Journal of Educational Measurement, 55: 564-581
Smoothing techniques are designed to improve the accuracy of equating functions. The main purpose of this study is to compare seven model selection strategies for choosing the smoothing parameter (C) for polynomial loglinear presmoothing and one procedure for model selection in cubic spline postsmoothing for mixed‐format pseudo tests under the random groups design.
Academic Medicine: November 2018 - Volume 93 - Issue 11S - p S21-S29
This study investigates the impact of incorporating observer-reported workload into workplace-based assessment (WBA) scores on (1) psychometric characteristics of WBA scores and (2) measuring changes in performance over time using workload-unadjusted versus workload-adjusted scores.
Educational Measurement: Issues and Practice, 37: 5-8
This article spotlights the winners of the 2018 EM:IP Cover Graphic/Data Visualization Competition.
Journal of Educational Measurement, 55: 308-327
The widespread move to computerized test delivery has led to the development of new approaches to evaluating how examinees use testing time and to new metrics designed to provide evidence about the extent to which time limits impact performance. Much of the existing research is based on these types of observational metrics; relatively few studies use randomized experiments to evaluate the impact time limits on scores. Of those studies that do report on randomized experiments, none directly compare the experimental results to evidence from observational metrics to evaluate the extent to which these metrics are able to sensitively identify conditions in which time constraints actually impact scores. The present study provides such evidence based on data from a medical licensing examination.
Applied Psychological Measurement: Volume: 42 issue: 4, page(s): 291-306
The research presented in this article combines mathematical derivations and empirical results to investigate effects of the nonparametric anchoring vignette approach proposed by King, Murray, Salomon, and Tandon on the reliability and validity of rating data. The anchoring vignette approach aims to correct rating data for response styles to improve comparability across individuals and groups.
Applied Psychological Measurement: Volume: 42 issue: 8, page(s): 595-612
Conventional methods for evaluating the utility of subscores rely on reliability and correlation coefficients. However, correlations can overlook a notable source of variability: variation in subtest means/difficulties. Brennan introduced a reliability index for score profiles based on multivariate generalizability theory, designated as G, which is sensitive to variation in subtest difficulty. However, there has been little, if any, research evaluating the properties of this index. A series of simulation experiments, as well as analyses of real data, were conducted to investigate G under various conditions of subtest reliability, subtest correlations, and variability in subtest means.
Educational Measurement: Issues and Practice, 37: 40-45
This simulation study demonstrates that the strength of item dependencies and the location of an examination systems’ cut‐points both influence the accuracy (i.e., the sensitivity and specificity) of examinee classifications. Practical implications of these results are discussed in terms of false positive and false negative classifications of test takers.