
RESEARCH LIBRARY
RESEARCH LIBRARY
View the latest publications from members of the NBME research team
Applied Psychological Measurement: Volume 46, issue 2, page(s) 571-588
This study evaluates the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item.
Applied Psychological Measurement: Volume 46, issue 6, page(s) 529-547
The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria.
Journal of Educational Measurement: Volume 59, Issue 2, Pages 140-160
A conceptual framework for thinking about the problem of score comparability is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.
Teaching and Learning in Medicine: Volume 33 - Issue 4 - p 366-381
CSE scores for students from eight schools that moved Step 1 after core clerkships between 2012 and 2016 were analyzed in a pre-post format. Hierarchical linear modeling was used to quantify the effect of the curriculum on CSE performance. Additional analysis determined if clerkship order impacted clinical subject exam performance and whether the curriculum change resulted in more students scoring in the lowest percentiles before and after the curricular change.
Educational Measurement: Issues and Practice
This article aims to answer the question: when the assumption that examinees may apply themselves fully yet still respond incorrectly is violated, what are the consequences of using the modified model proposed by Lewis and his colleagues?
Journal of Educational Measurement: Volume 57, Issue 2, Pages 216-229
This article presents two generalizability-theory–based analyses of the proportion of the item variance that contributes to error in the cut score. For one approach, variance components are estimated on the probability (or proportion-correct) scale of the Angoff judgments, and for the other, the judgments are transferred to the theta scale of an item response theory model before estimating the variance components.
Educational Measurement: Issues and Practice, 39: 30-36
This article proposes the conscious weight method and subconscious weight method to bring more objectivity to the standard setting process. To do this, these methods quantify the relative harm of the negative consequences of false positive and false negative misclassification.
Journal of Pain and Symptom Management: Volume 56, Issue 3, p371-378
This article reviews the USMLE step examinations to determine whether they test the palliative care (PC) knowledge necessary for graduating medical students and residents applying for licensure.
Journal of Veterinary Medical Education 2018 45:3, 381-387
This study uses item response data from the November–December 2014 and April 2015 NAVLE administrations (n =5,292), to conduct timing analyses comparing performance across several examinee subgroups. The results provide evidence that conditions were sufficient for most examinees, thereby supporting the current time limits. For the relatively few examinees who may have been impacted, results suggest the cause is not a bias with the test but rather the effect of poor pacing behavior combined with knowledge deficits.