Showing 1 - 6 of 6 Research Library Publications
Posted: | Victoria Yaneva, Janet Mee, Le Ha, Polina Harik, Michael Jodoin, Alex Mechaber

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - p 2880–2886

 

This paper presents a corpus of 43,985 clinical patient notes (PNs) written by 35,156 examinees during the high-stakes USMLE® Step 2 Clinical Skills examination.

Posted: | J. Salt, P. Harik, M. A. Barone

Academic Medicine: July 2019 - Volume 94 - Issue 7 - p 926-927

 

A response to concerns regarding potential bias in the implementation of machine learning (ML) to scoring of the United States Medical Licensing Examination Step 2 Clinical Skills (CS) patient notes (PN).

Posted: | Z. Cui, C. Liu, Y. He, H. Chen

Journal of Educational Measurement: Volume 55, Issue 4, Pages 582-594

 

This article proposes and evaluates a new method that implements computerized adaptive testing (CAT) without any restriction on item review. In particular, it evaluates the new method in terms of the accuracy on ability estimates and the robustness against test‐manipulation strategies. This study shows that the newly proposed method is promising in a win‐win situation: examinees have full freedom to review and change answers, and the impacts of test‐manipulation strategies are undermined.

Posted: | S. Tackett, M. Raymond, R. Desai, S. A. Haist, A. Morales, S. Gaglani, S. G. Clyman

Medical Teacher: Volume 40 - Issue 8 - p 838-841

 

Adaptive learning requires frequent and valid assessments for learners to track progress against their goals. This study determined if multiple-choice questions (MCQs) “crowdsourced” from medical learners could meet the standards of many large-scale testing programs.

Posted: | I. Kirsch, W. Thorn, M. von Davier

Quality Assurance in Education, Vol. 26 No. 2, pp. 150-152

 

An introduction to a special issue of Quality Assurance in Education featuring papers based on presentations at a two-day international seminar on managing the quality of data collection in large-scale assessments.

Posted: | M. von Davier

Psychometrika 83, 847–857 (2018)

 

Utilizing algorithms to generate items in educational and psychological testing is an active area of research for obvious reasons: Test items are predominantly written by humans, in most cases by content experts who represent a limited and potentially costly resource. Using algorithms instead has the appeal to provide an unlimited resource for this crucial part of assessment development.