Journal of Educational Measurement: Volume 59, Issue 2, Pages 140-160
A conceptual framework for thinking about the problem of score comparability is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.
Proceedings of the 28th International Conference on Computational Linguistics
This paper brings together approaches from the fields of NLP and psychometric measurement to address the problem of predicting examinee proficiency from responses to short-answer questions (SAQs).
Evaluation & the Health Professions: Volume: 43 issue: 3, page(s): 149-158
This study examines the innovative and practical application of DCM framework to health professions educational assessments using retrospective large-scale assessment data from the basic and clinical sciences: National Board of Medical Examiners Subject Examinations in pathology (n = 2,006) and medicine (n = 2,351).
Academic Medicine: September 2020 - Volume 95 - Issue 9 - p 1388-1395
This article aims to assess the correlations between United States Medical Licensing Examination (USMLE) performance, American College of Physicians Internal Medicine In-Training Examination (IM-ITE) performance, American Board of Internal Medicine Internal Medicine Certification Exam (IM-CE) performance, and other medical knowledge and demographic variables.
Journal of Educational and Behavioral Statistics: Vol 45, Issue 5, 2020
This article describes a method for identifying and reporting unexpectedly high or low subscores by comparing each examinee’s observed subscore with a discrete probability distribution of subscores conditional on the examinee’s overall ability.
Handbook of Automated Scoring
In this chapter we describe the historical background that led to development of the simulations and the subsequent refinement of the construct that occurred as the interface was being developed. We then describe the evolution of the automated scoring procedures from linear regression modeling to rule-based procedures.
Springer International Publishing; 2019
This handbook provides an overview of major developments around diagnostic classification models (DCMs) with regard to modeling, estimation, model checking, scoring, and applications. It brings together not only the current state of the art, but also the theoretical background and models developed for diagnostic classification.
On the Cover. Educational Measurement: Issues and Practice, 38: 5-5
This informative graphic reports between‐individual information where a vertical line—with dashed lines on either side indicating an error band—spans three graphics allowing a student to easily see their score relative to four defined performance categories and, more notably, three relevant score distributions.
Academic Medicine: March 2019 - Volume 94 - Issue 3 - p 314-316
The United States Medical Licensing Examination Step 2 Clinical Skills (CS) exam uses physician raters to evaluate patient notes written by examinees. In this Invited Commentary, the authors describe the ways in which the Step 2 CS exam could benefit from adopting a computer-assisted scoring approach that combines physician raters’ judgments with computer-generated scores based on natural language processing (NLP).
Psychometrika 84, 147–163 (2019)
This paper provides results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty. The presentation of items is adaptive in the sense that a session is discontinued once a test taker produces a certain number of incorrect responses in sequence, with subsequent (not observed) responses commonly scored as wrong.