Showing 1 - 10 of 12 Research Library Publications
Posted: | C. Liu, M. J. Kolen

Journal of Educational Measurement: Volume 55, Issue 4, Pages 564-581

 

Smoothing techniques are designed to improve the accuracy of equating functions. The main purpose of this study is to compare seven model selection strategies for choosing the smoothing parameter (C) for polynomial loglinear presmoothing and one procedure for model selection in cubic spline postsmoothing for mixed‐format pseudo tests under the random groups design.

Posted: | S. Pohl, M. von Davier

Front. Psychol. 9:1988

 

In their 2018 article, (T&B) discuss how to deal with not reached items due to low working speed in ability tests (Tijmstra and Bolsinova, 2018). An important contribution of the paper is focusing on the question of how to define the targeted ability measure. This note aims to add further aspects to this discussion and to propose alternative approaches.

Posted: | Z. Cui, C. Liu, Y. He, H. Chen

Journal of Educational Measurement: Volume 55, Issue 4, Pages 582-594

 

This article proposes and evaluates a new method that implements computerized adaptive testing (CAT) without any restriction on item review. In particular, it evaluates the new method in terms of the accuracy on ability estimates and the robustness against test‐manipulation strategies. This study shows that the newly proposed method is promising in a win‐win situation: examinees have full freedom to review and change answers, and the impacts of test‐manipulation strategies are undermined.

Posted: | E. C. Carey, M. Paniagua, L. J. Morrison, S. K. Levine, J. C. Klick, G. T. Buckholz, J. Rotella, J. Bruno, S. Liao, R. M. Arnold

Journal of Pain and Symptom Management: Volume 56, Issue 3, p371-378

 

This article reviews the USMLE step examinations to determine whether they test the palliative care (PC) knowledge necessary for graduating medical students and residents applying for licensure.

Posted: | S. Tackett, M. Raymond, R. Desai, S. A. Haist, A. Morales, S. Gaglani, S. G. Clyman

Medical Teacher: Volume 40 - Issue 8 - p 838-841

 

Adaptive learning requires frequent and valid assessments for learners to track progress against their goals. This study determined if multiple-choice questions (MCQs) “crowdsourced” from medical learners could meet the standards of many large-scale testing programs.

Posted: | P. Harik, B. E. Clauser, I. Grabovsky, P. Baldwin, M. Margolis, D. Bucak, M. Jodoin, W. Walsh, S. Haist

Journal of Educational Measurement: Volume 55, Issue 2, Pages 308-327

 

The widespread move to computerized test delivery has led to the development of new approaches to evaluating how examinees use testing time and to new metrics designed to provide evidence about the extent to which time limits impact performance. Much of the existing research is based on these types of observational metrics; relatively few studies use randomized experiments to evaluate the impact time limits on scores. Of those studies that do report on randomized experiments, none directly compare the experimental results to evidence from observational metrics to evaluate the extent to which these metrics are able to sensitively identify conditions in which time constraints actually impact scores. The present study provides such evidence based on data from a medical licensing examination.

Posted: | M. von Davier, J. H. Shin, L. Khorramdel, L. Stankov

Applied Psychological Measurement: Volume: 42 issue: 4, page(s): 291-306

 

The research presented in this article combines mathematical derivations and empirical results to investigate effects of the nonparametric anchoring vignette approach proposed by King, Murray, Salomon, and Tandon on the reliability and validity of rating data. The anchoring vignette approach aims to correct rating data for response styles to improve comparability across individuals and groups.

Posted: | D. Jurich, L. M. Duhigg, T. J. Plumb, S. A. Haist, J. L. Hawley, R. S. Lipner, L. Smith, S. M. Norby

CJASN May 2018, 13 (5) 710-717

 

Medical specialty and subspecialty fellowship programs administer subject-specific in-training examinations to provide feedback about level of medical knowledge to fellows preparing for subsequent board certification. This study evaluated the association between the American Society of Nephrology In-Training Examination and the American Board of Internal Medicine Nephrology Certification Examination in terms of scores and passing status.

Posted: | Z. Jiang, M.R. Raymond

Applied Psychological Measurement: Volume: 42 issue: 8, page(s): 595-612

 

Conventional methods for evaluating the utility of subscores rely on reliability and correlation coefficients. However, correlations can overlook a notable source of variability: variation in subtest means/difficulties. Brennan introduced a reliability index for score profiles based on multivariate generalizability theory, designated as G, which is sensitive to variation in subtest difficulty. However, there has been little, if any, research evaluating the properties of this index. A series of simulation experiments, as well as analyses of real data, were conducted to investigate G under various conditions of subtest reliability, subtest correlations, and variability in subtest means.

Posted: | I. Kirsch, W. Thorn, M. von Davier

Quality Assurance in Education, Vol. 26 No. 2, pp. 150-152

 

An introduction to a special issue of Quality Assurance in Education featuring papers based on presentations at a two-day international seminar on managing the quality of data collection in large-scale assessments.