RESEARCH LIBRARY

View the latest publications from members of the NBME research team

Showing 71 - 80 of 84 Research Library Publications

The Effects of Vignette Scoring on Reliability and Validity of Self-Reports

Posted: June 1, 2018 | M. von Davier, J. H. Shin, L. Khorramdel, L. Stankov

Applied Psychological Measurement: Volume: 42 issue: 4, page(s): 291-306

The research presented in this article combines mathematical derivations and empirical results to investigate effects of the nonparametric anchoring vignette approach proposed by King, Murray, Salomon, and Tandon on the reliability and validity of rating data. The anchoring vignette approach aims to correct rating data for response styles to improve comparability across individuals and groups.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring

Performance on the Nephrology In-Training Examination and ABIM Nephrology Certification Examination Outcomes

Posted: May 7, 2018 | D. Jurich, L. M. Duhigg, T. J. Plumb, S. A. Haist, J. L. Hawley, R. S. Lipner, L. Smith, S. M. Norby

CJASN May 2018, 13 (5) 710-717

Medical specialty and subspecialty fellowship programs administer subject-specific in-training examinations to provide feedback about level of medical knowledge to fellows preparing for subsequent board certification. This study evaluated the association between the American Society of Nephrology In-Training Examination and the American Board of Internal Medicine Nephrology Certification Examination in terms of scores and passing status.

Category:Assessment-Oriented Research, Reliability/Validity, Health Professions

A Novel Workplace-Based Assessment for Competency-Based Decisions and Learner Feedback

Posted: April 24, 2018 | P.J. Hicks, M.J. Margolis, C.L. Carraccio, B.E. Clauser, K. Donnelly, H.B. Fromme, K.A. Gifford, S.E. Poynter, D.J. Schumacher, A. Schwartz & the PMAC Module 1 Study Group

Medical Teacher: Volume 40 - Issue 11 - p 1143-1150

This study explores a novel milestone-based workplace assessment system that was implemented in 15 pediatrics residency programs. The system provided: web-based multisource feedback and structured clinical observation instruments that could be completed on any computer or mobile device; and monthly feedback reports that included competency-level scores and recommendations for improvement.

Category:Assessment-Oriented Research, General Measurement

The Use of Multivariate Generalizability Theory to Evaluate the Quality of Subscores

Posted: April 3, 2018 | Z. Jiang, M.R. Raymond

Applied Psychological Measurement: Volume: 42 issue: 8, page(s): 595-612

Conventional methods for evaluating the utility of subscores rely on reliability and correlation coefficients. However, correlations can overlook a notable source of variability: variation in subtest means/difficulties. Brennan introduced a reliability index for score profiles based on multivariate generalizability theory, designated as G, which is sensitive to variation in subtest difficulty. However, there has been little, if any, research evaluating the properties of this index. A series of simulation experiments, as well as analyses of real data, were conducted to investigate G under various conditions of subtest reliability, subtest correlations, and variability in subtest means.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring

Effects and Unforeseen Consequences of Accessing References on a Maintenance of Certification Examination

Posted: April 1, 2018 | R. A. Feinberg, D. P. Jurich, L. M. Foster

Academic Medicine: April 2018 - Volume 93 - Issue 4 - p 636-641

Increasing criticism of maintenance of certification (MOC) examinations has prompted certifying boards to explore alternative assessment formats. The purpose of this study was to examine the effect of allowing test takers to access reference material while completing their MOC Part III standardized examination.

Category:Assessment-Oriented Research, General Measurement

Diagnosing Diagnostic Models: From Von Neumann’s Elephant to Model Equivalencies and Network Psychometrics

Posted: March 30, 2018 | M. von Davier

Measurement: Interdisciplinary Research and Perspectives, 16:1, 59-70

This article critically reviews how diagnostic models have been conceptualized and how they compare to other approaches used in educational measurement. In particular, certain assumptions that have been taken for granted and used as defining characteristics of diagnostic models are reviewed and it is questioned whether these assumptions are the reason why these models have not had the success in operational analyses and large-scale applications, contrary to what many have hoped.

Category:Assessment-Oriented Research, General Measurement

It's Happening Sooner Than You Think: Spotlighting the Pre-Medical Realm

Posted: March 25, 2018 | B. Michalec, M. M. Cuddy, P. Hafferty, M. D. Hanson, S. L. Kanter, D. Littleton, M. A. T. Martimianakis, R. Michaels, F. W. Hafferty

Med Educ, 52: 359-361

Focusing specifically on examples set in the context of movement from Bachelor's level undergraduate programmes to enrolment in medical school, this publication argues that a great deal of what happens on college campuses today, curricular and otherwise, is (in)directly driven by the not‐so‐invisible hand of the medical education enterprise.

Category:Assessment-Oriented Research, Links to Outcomes, Health Professions

Automated Item Generation with Recurrent Neural Networks

Posted: March 12, 2018 | M. von Davier

Psychometrika 83, 847–857 (2018)

Utilizing algorithms to generate items in educational and psychological testing is an active area of research for obvious reasons: Test items are predominantly written by humans, in most cases by content experts who represent a limited and potentially costly resource. Using algorithms instead has the appeal to provide an unlimited resource for this crucial part of assessment development.

Category:Assessment-Oriented Research, Applications of Technology

Examining the Validity of the North American Veterinary Licensing Examination (NAVLE) Time Constraints

Posted: February 2, 2018 | R.A. Feinberg, D. Jurich, J. Lord, H. Case, J. Hawley

Journal of Veterinary Medical Education 2018 45:3, 381-387

This study uses item response data from the November–December 2014 and April 2015 NAVLE administrations (n =5,292), to conduct timing analyses comparing performance across several examinee subgroups. The results provide evidence that conditions were sufficient for most examinees, thereby supporting the current time limits. For the relatively few examinees who may have been impacted, results suggest the cause is not a bias with the test but rather the effect of poor pacing behavior combined with knowledge deficits.

Category:Assessment-Oriented Research, Reliability/Validity, Product-Oriented Research, NBME

Impact of Both Local Item Dependencies and Cut-Point Locations on Examinee Classifications

Posted: January 24, 2018 | J. D. Rubright

Educational Measurement: Issues and Practice, 37: 40-45

This simulation study demonstrates that the strength of item dependencies and the location of an examination systems’ cut‐points both influence the accuracy (i.e., the sensitivity and specificity) of examinee classifications. Practical implications of these results are discussed in terms of false positive and false negative classifications of test takers.

Category:Assessment-Oriented Research, General Measurement, Scoring

NBME Self-Assessment Bundles

Stay Up to Date

Stay Up to Date

New Psychometric Workshops

INSIGHTS® Demo

Open Grant Opportunities

RESEARCH LIBRARY

Filter:

The Effects of Vignette Scoring on Reliability and Validity of Self-Reports

Performance on the Nephrology In-Training Examination and ABIM Nephrology Certification Examination Outcomes

A Novel Workplace-Based Assessment for Competency-Based Decisions and Learner Feedback

The Use of Multivariate Generalizability Theory to Evaluate the Quality of Subscores

Effects and Unforeseen Consequences of Accessing References on a Maintenance of Certification Examination

Diagnosing Diagnostic Models: From Von Neumann’s Elephant to Model Equivalencies and Network Psychometrics

It's Happening Sooner Than You Think: Spotlighting the Pre-Medical Realm

Automated Item Generation with Recurrent Neural Networks

Examining the Validity of the North American Veterinary Licensing Examination (NAVLE) Time Constraints

Impact of Both Local Item Dependencies and Cut-Point Locations on Examinee Classifications