Academic Medicine: May 2018 - Volume 93 - Issue 5 - p 781-785
In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred.
Medical Teacher: Volume 40 - Issue 11 - p 1143-1150
This study explores a novel milestone-based workplace assessment system that was implemented in 15 pediatrics residency programs. The system provided: web-based multisource feedback and structured clinical observation instruments that could be completed on any computer or mobile device; and monthly feedback reports that included competency-level scores and recommendations for improvement.
Applied Psychological Measurement: Volume: 42 issue: 8, page(s): 595-612
Conventional methods for evaluating the utility of subscores rely on reliability and correlation coefficients. However, correlations can overlook a notable source of variability: variation in subtest means/difficulties. Brennan introduced a reliability index for score profiles based on multivariate generalizability theory, designated as G, which is sensitive to variation in subtest difficulty. However, there has been little, if any, research evaluating the properties of this index. A series of simulation experiments, as well as analyses of real data, were conducted to investigate G under various conditions of subtest reliability, subtest correlations, and variability in subtest means.
Quality Assurance in Education, Vol. 26 No. 2, pp. 150-152
An introduction to a special issue of Quality Assurance in Education featuring papers based on presentations at a two-day international seminar on managing the quality of data collection in large-scale assessments.
Quality Assurance in Education, Vol. 26 No. 2, pp. 243-262
Surveys that include skill measures may suffer from additional sources of error compared to those containing questionnaires alone. Examples are distractions such as noise or interruptions of testing sessions, as well as fatigue or lack of motivation to succeed. This paper aims to provide a review of statistical tools based on latent variable modeling approaches extended by explanatory variables that allow detection of survey errors in skill surveys.
Academic Medicine: April 2018 - Volume 93 - Issue 4 - p 636-641
Increasing criticism of maintenance of certification (MOC) examinations has prompted certifying boards to explore alternative assessment formats. The purpose of this study was to examine the effect of allowing test takers to access reference material while completing their MOC Part III standardized examination.
Measurement: Interdisciplinary Research and Perspectives, 16:1, 59-70
This article critically reviews how diagnostic models have been conceptualized and how they compare to other approaches used in educational measurement. In particular, certain assumptions that have been taken for granted and used as defining characteristics of diagnostic models are reviewed and it is questioned whether these assumptions are the reason why these models have not had the success in operational analyses and large-scale applications, contrary to what many have hoped.
Alzheimer's & Dementia, 14: 925-932
The purpose of this survey is to understand how the prevalence of beliefs, attitudes, and expectations about Alzheimer's disease dementia in the public could inform strategies to mitigate stigma.
Med Educ, 52: 359-361
Focusing specifically on examples set in the context of movement from Bachelor's level undergraduate programmes to enrolment in medical school, this publication argues that a great deal of what happens on college campuses today, curricular and otherwise, is (in)directly driven by the not‐so‐invisible hand of the medical education enterprise.
Psychometrika 83, 847–857 (2018)
Utilizing algorithms to generate items in educational and psychological testing is an active area of research for obvious reasons: Test items are predominantly written by humans, in most cases by content experts who represent a limited and potentially costly resource. Using algorithms instead has the appeal to provide an unlimited resource for this crucial part of assessment development.