Showing 1 - 6 of 6 Research Library Publications
Posted: | Mark Gierl, Kimberly Swygert, Donna Matovinovic, Allison Kulesher, Hollis Lai

Teaching and Learning in Medicine: Volume 33 - Issue 4 - p 366-381

 

The purpose of this analysis is to describe these sources of evidence that can be used to evaluate the quality of generated items. The important role of medical expertise in the development and evaluation of the generated items is highlighted as a crucial requirement for producing validation evidence.

Posted: | P. Harik, R.A. Feinberg RA, B.E. Clauser

Integrating Timing Considerations to Improve Testing Practices

 

This chapter addresses a different aspect of the use of timing data: it provides a framework for understanding how an examinee's use of time interfaces with time limits to impact both test performance and the validity of inferences made based on test scores. It focuses primarily on examinations that are administered as part of the physician licensure process.

Posted: | B. E. Clauser, M. Kane, J. C. Clauser

Journal of Educational Measurement: Volume 57, Issue 2, Pages 216-229

 

This article presents two generalizability-theory–based analyses of the proportion of the item variance that contributes to error in the cut score. For one approach, variance components are estimated on the probability (or proportion-correct) scale of the Angoff judgments, and for the other, the judgments are transferred to the theta scale of an item response theory model before estimating the variance components.

Posted: | M.R. Raymond, C. Stevens, S.D. Bucak

Adv in Health Sci Educ 24, 141–150 (2019)

 

Research suggests that the three-option format is optimal for multiple choice questions (MCQs). This conclusion is supported by numerous studies showing that most distractors (i.e., incorrect answers) are selected by so few examinees that they are essentially nonfunctional. However, nearly all studies have defined a distractor as nonfunctional if it is selected by fewer than 5% of examinees.

Posted: | Z. Jiang, M.R. Raymond

Applied Psychological Measurement: Volume: 42 issue: 8, page(s): 595-612

 

Conventional methods for evaluating the utility of subscores rely on reliability and correlation coefficients. However, correlations can overlook a notable source of variability: variation in subtest means/difficulties. Brennan introduced a reliability index for score profiles based on multivariate generalizability theory, designated as G, which is sensitive to variation in subtest difficulty. However, there has been little, if any, research evaluating the properties of this index. A series of simulation experiments, as well as analyses of real data, were conducted to investigate G under various conditions of subtest reliability, subtest correlations, and variability in subtest means.

Posted: | K. Walsh, P. Harik, K. Mazor, D. Perfetto, M. Anatchkova, C. Biggins, J. Wagner

Medical Care: April 2017 - Volume 55 - Issue 4 - p 436-441

 

The objective of this study is to identify modifiable factors that improve the reliability of ratings of severity of health care–associated harm in clinical practice improvement and research.