Showing 1 - 7 of 7 Research Library Publications
Posted: | Thai Q. Ong, Dena A. Pastor

Applied Psychological Measurement: Volume 46, issue 2, page(s) 571-588

 

This study evaluates the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item.

Posted: | Chunyan Liu, Daniel Jurich

Applied Psychological Measurement: Volume 46, issue 6, page(s) 529-547

 

The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria.

Posted: | Peter Baldwin, Brian E. Clauser

Journal of Educational Measurement: Volume 59, Issue 2, Pages 140-160

 

A conceptual framework for thinking about the problem of score comparability is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.

Posted: | Victoria Yaneva, Brian E. Clauser, Amy Morales, Miguel Paniagua

Journal of Educational Measurement: Volume 58, Issue 4, Pages 515-537

 

In this paper, the NBME team reports the results an eye-tracking study designed to evaluate how the presence of the options in multiple-choice questions impacts the way medical students responded to questions designed to evaluate clinical reasoning. Examples of the types of data that can be extracted are presented. We then discuss the implications of these results for evaluating the validity of inferences made based on the type of items used in this study.

Posted: | Peter Baldwin

Educational Measurement: Issues and Practice

 

This article aims to answer the question: when the assumption that examinees may apply themselves fully yet still respond incorrectly is violated, what are the consequences of using the modified model proposed by Lewis and his colleagues? 

Posted: | B. E. Clauser, M. Kane, J. C. Clauser

Journal of Educational Measurement: Volume 57, Issue 2, Pages 216-229

 

This article presents two generalizability-theory–based analyses of the proportion of the item variance that contributes to error in the cut score. For one approach, variance components are estimated on the probability (or proportion-correct) scale of the Angoff judgments, and for the other, the judgments are transferred to the theta scale of an item response theory model before estimating the variance components.

Posted: | P. Baldwin, M.J. Margolis, B.E. Clauser, J. Mee, M. Winward

Educational Measurement: Issues and Practice, 39: 37-44

 

This article presents the results of an experiment in which content experts were randomly assigned to one of two response probability conditions: .67 and .80. If the standard-setting judgments collected with the bookmark procedure are internally consistent, both conditions should produce highly similar cut scores.