Med Educ, 52: 359-361
Focusing specifically on examples set in the context of movement from Bachelor's level undergraduate programmes to enrolment in medical school, this publication argues that a great deal of what happens on college campuses today, curricular and otherwise, is (in)directly driven by the not‐so‐invisible hand of the medical education enterprise.
Psychometrika 83, 847–857 (2018)
Utilizing algorithms to generate items in educational and psychological testing is an active area of research for obvious reasons: Test items are predominantly written by humans, in most cases by content experts who represent a limited and potentially costly resource. Using algorithms instead has the appeal to provide an unlimited resource for this crucial part of assessment development.
Journal of Veterinary Medical Education 2018 45:3, 381-387
This study uses item response data from the November–December 2014 and April 2015 NAVLE administrations (n =5,292), to conduct timing analyses comparing performance across several examinee subgroups. The results provide evidence that conditions were sufficient for most examinees, thereby supporting the current time limits. For the relatively few examinees who may have been impacted, results suggest the cause is not a bias with the test but rather the effect of poor pacing behavior combined with knowledge deficits.
Educational Measurement: Issues and Practice, 37: 40-45
This simulation study demonstrates that the strength of item dependencies and the location of an examination systems’ cut‐points both influence the accuracy (i.e., the sensitivity and specificity) of examinee classifications. Practical implications of these results are discussed in terms of false positive and false negative classifications of test takers.
CLEAR Exam Review 2018 27(2): 21-27
The purpose of this paper is to suggest an approach to job analysis that addresses broad competencies while maintaining the rigor of traditional job analysis and the specificity of good test blueprints.
Western Journal of Emergency Medicine: Integrating Emergency Care with Population Health, 19(1)
This review is a descriptive summary of the development of National EM M4 examinations, Version 1 (V1) and Version 2 (V2), and the NBME EM Advanced Clinical Examination (ACE) and their relevant usage and performance data. In particular, it describes how examination content was edited to affect desired changes in examination performance data and offers a model for educators seeking to develop their own examinations.
Qual Life Res 27, 1711–1720 (2018)
The US Food and Drug Administration (FDA), as part of its regulatory mission, is charged with determining whether a clinical outcome assessment (COA) is “fit for purpose” when used in clinical trials to support drug approval and product labeling. This paper provides a review (and some commentary) on the current state of affairs in COA development/evaluation/use with a focus on one aspect: How do you know you are measuring the right thing? In the psychometric literature, this concept is referred to broadly as validity and has itself evolved over many years of research and application.
Medical Care: April 2017 - Volume 55 - Issue 4 - p 436-441
The objective of this study is to identify modifiable factors that improve the reliability of ratings of severity of health care–associated harm in clinical practice improvement and research.
Academic Medicine: Volume 88 - Issue 11 - p 1670-1675
From 2007 through 2012, the NBME team reviewed literature in physician–patient communication, examined performance characteristics of the Step 2 CS exam, observed case development and quality assurance processes, interviewed SPs and their trainers, and reviewed video recordings of examinee–SP interactions. The authors describe perspectives gained by their team from the review process and outline the resulting enhancements to the Step 2 CS exam, some of which were rolled out in June 2012.