
RESEARCH LIBRARY
RESEARCH LIBRARY
View the latest publications from members of the NBME research team
Teaching and Learning in Medicine: Volume 33 - Issue 4 - p 366-381
The purpose of this analysis is to describe these sources of evidence that can be used to evaluate the quality of generated items. The important role of medical expertise in the development and evaluation of the generated items is highlighted as a crucial requirement for producing validation evidence.
Applied Psychological Measurement: Volume 46, issue 6, page(s) 529-547
The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria.
Journal of Educational Measurement: Volume 59, Issue 2, Pages 140-160
A conceptual framework for thinking about the problem of score comparability is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.
Psychometrika 83, 847–857 (2018)
Utilizing algorithms to generate items in educational and psychological testing is an active area of research for obvious reasons: Test items are predominantly written by humans, in most cases by content experts who represent a limited and potentially costly resource. Using algorithms instead has the appeal to provide an unlimited resource for this crucial part of assessment development.