Journal of Educational Measurement: Volume 58, Issue 4, Pages 515-537
In this paper, the NBME team reports the results an eye-tracking study designed to evaluate how the presence of the options in multiple-choice questions impacts the way medical students responded to questions designed to evaluate clinical reasoning. Examples of the types of data that can be extracted are presented. We then discuss the implications of these results for evaluating the validity of inferences made based on the type of items used in this study.
ACM SIGACCESS Accessibility and Computing
In this article, we first summarise STA (Scanpath Trend Analysis) with its application in autism detection, and then discuss future directions for this research.
IEEE Transactions on Neural Systems and Rehabilitation Engineering
The purpose of this study is to test whether visual processing differences between adults with and without high-functioning autism captured through eye tracking can be used to detect autism.
Academic Medicine: March 2019 - Volume 94 - Issue 3 - p 314-316
The United States Medical Licensing Examination Step 2 Clinical Skills (CS) exam uses physician raters to evaluate patient notes written by examinees. In this Invited Commentary, the authors describe the ways in which the Step 2 CS exam could benefit from adopting a computer-assisted scoring approach that combines physician raters’ judgments with computer-generated scores based on natural language processing (NLP).
Journal of Educational Measurement: Volume 55, Issue 4, Pages 582-594
This article proposes and evaluates a new method that implements computerized adaptive testing (CAT) without any restriction on item review. In particular, it evaluates the new method in terms of the accuracy on ability estimates and the robustness against test‐manipulation strategies. This study shows that the newly proposed method is promising in a win‐win situation: examinees have full freedom to review and change answers, and the impacts of test‐manipulation strategies are undermined.
Medical Teacher: Volume 40 - Issue 8 - p 838-841
Adaptive learning requires frequent and valid assessments for learners to track progress against their goals. This study determined if multiple-choice questions (MCQs) “crowdsourced” from medical learners could meet the standards of many large-scale testing programs.
Quality Assurance in Education, Vol. 26 No. 2, pp. 150-152
An introduction to a special issue of Quality Assurance in Education featuring papers based on presentations at a two-day international seminar on managing the quality of data collection in large-scale assessments.
Psychometrika 83, 847–857 (2018)
Utilizing algorithms to generate items in educational and psychological testing is an active area of research for obvious reasons: Test items are predominantly written by humans, in most cases by content experts who represent a limited and potentially costly resource. Using algorithms instead has the appeal to provide an unlimited resource for this crucial part of assessment development.