RESEARCH LIBRARY

View the latest publications from members of the NBME research team

Showing 1 - 10 of 16 Research Library Publications

Extracting Linguistic Signal From Item Text and Its Application to Modeling Item Characteristics

Posted: June 5, 2023 | Victoria Yaneva, Peter Baldwin, Le An Ha, Christopher Runyon

Advancing Natural Language Processing in Educational Assessment: Pages 167-182

This chapter discusses the evolution of natural language processing (NLP) approaches to text representation and how different ways of representing text can be utilized for a relatively understudied task in educational assessment – that of predicting item characteristics from item text.

Category:Assessment-Oriented Research, Applications of Technology, Scoring

Assessment of Clinical Skills: A Case Study in Constructing an NLP-based Scoring System for Patient Notes

Posted: June 5, 2023 | Polina Harik, Janet Mee, Christopher Runyon, Brian E. Clauser

Advancing Natural Language Processing in Educational Assessment: Pages 58-73

This chapter describes INCITE, an NLP-based system for scoring free-text responses. It emphasizes the importance of context and the system’s intended use and explains how each component of the system contributed to its accuracy.

Category:Assessment-Oriented Research, Applications of Technology, Scoring

A Problem with the Bookmark Procedure's Correction for Guessing

Posted: November 24, 2020 | Peter Baldwin

Educational Measurement: Issues and Practice

This article aims to answer the question: when the assumption that examinees may apply themselves fully yet still respond incorrectly is violated, what are the consequences of using the modified model proposed by Lewis and his colleagues?

Category:Assessment-Oriented Research, General Measurement

Examining the Precision of Cut Scores Within a Generalizability Theory Framework: A Closer Look at the Item Effect

Posted: June 3, 2020 | B. E. Clauser, M. Kane, J. C. Clauser

Journal of Educational Measurement: Volume 57, Issue 2, Pages 216-229

This article presents two generalizability-theory–based analyses of the proportion of the item variance that contributes to error in the cut score. For one approach, variance components are estimated on the probability (or proportion-correct) scale of the Angoff judgments, and for the other, the judgments are transferred to the theta scale of an item response theory model before estimating the variance components.

Category:Assessment-Oriented Research, Reliability/Validity

Adding Objectivity to Standard Setting: Evaluating Consequence Using the Conscious and Subconscious Weight Methods

Posted: February 26, 2020 | B.C. Leventhal, I. Grabovsky

Educational Measurement: Issues and Practice, 39: 30-36

This article proposes the conscious weight method and subconscious weight method to bring more objectivity to the standard setting process. To do this, these methods quantify the relative harm of the negative consequences of false positive and false negative misclassification.

Category:Assessment-Oriented Research, General Measurement

Handbook of Diagnostic Classification Models

Posted: August 31, 2019 | M. von Davier, YS. Lee

Springer International Publishing; 2019

This handbook provides an overview of major developments around diagnostic classification models (DCMs) with regard to modeling, estimation, model checking, scoring, and applications. It brings together not only the current state of the art, but also the theoretical background and models developed for diagnostic classification.

Category:Assessment-Oriented Research, General Measurement, Scoring

Visualizing Hierarchical Score Inferences

Posted: June 6, 2019 | R.A. Feinberg, D.P Jurich

On the Cover. Educational Measurement: Issues and Practice, 38: 5-5

This informative graphic reports between‐individual information where a vertical line—with dashed lines on either side indicating an error band—spans three graphics allowing a student to easily see their score relative to four defined performance categories and, more notably, three relevant score distributions.

Category:Assessment-Oriented Research, Scoring

Moving the United States Medical Licensing Examination Step 1 After Core Clerkships: An Outcomes Analysis

Posted: March 1, 2019 | D. Jurich, M. Daniel, M. Paniagua, A. Fleming, V. Harnik, A. Pock, A. Swan-Sein, M. A. Barone, S.A. Santen

Academic Medicine: March 2019 - Volume 94 - Issue 3 - p 371-377

Schools undergoing curricular reform are reconsidering the optimal timing of Step 1. This study provides a psychometric investigation of the impact on United States Medical Licensing Examination Step 1 scores of changing the timing of Step 1 from after completion of the basic science curricula to after core clerkships.

Category:Product-Oriented Research, USMLE

Leveraging Natural Language Processing: Toward Computer-Assisted Scoring of Patient Notes in the USMLE Step 2 Clinical Skills Exam

Posted: March 1, 2019 | J. Salt, P. Harik, M. A. Barone

Academic Medicine: March 2019 - Volume 94 - Issue 3 - p 314-316

The United States Medical Licensing Examination Step 2 Clinical Skills (CS) exam uses physician raters to evaluate patient notes written by examinees. In this Invited Commentary, the authors describe the ways in which the Step 2 CS exam could benefit from adopting a computer-assisted scoring approach that combines physician raters’ judgments with computer-generated scores based on natural language processing (NLP).

Category:Assessment-Oriented Research, Scoring, Applications of Technology, Product-Oriented Research, USMLE

The Choice of Response Probability in Bookmark Standard Setting: An Experimental Study

Posted: January 16, 2019 | P. Baldwin, M.J. Margolis, B.E. Clauser, J. Mee, M. Winward

Educational Measurement: Issues and Practice, 39: 37-44

This article presents the results of an experiment in which content experts were randomly assigned to one of two response probability conditions: .67 and .80. If the standard-setting judgments collected with the bookmark procedure are internally consistent, both conditions should produce highly similar cut scores.

Category:Assessment-Oriented Research, General Measurement

Stay Up to Date

USMLE® Fee Assistance

Communication Learning Assessment

Introduction to Measurement Concepts: Validity and Reliability

NBME Academy

Latin America Grants

USMLE® Fee Assistance

RESEARCH LIBRARY

Filter:

Extracting Linguistic Signal From Item Text and Its Application to Modeling Item Characteristics

Assessment of Clinical Skills: A Case Study in Constructing an NLP-based Scoring System for Patient Notes

A Problem with the Bookmark Procedure's Correction for Guessing

Examining the Precision of Cut Scores Within a Generalizability Theory Framework: A Closer Look at the Item Effect

Adding Objectivity to Standard Setting: Evaluating Consequence Using the Conscious and Subconscious Weight Methods

Handbook of Diagnostic Classification Models

Visualizing Hierarchical Score Inferences

Moving the United States Medical Licensing Examination Step 1 After Core Clerkships: An Outcomes Analysis

Leveraging Natural Language Processing: Toward Computer-Assisted Scoring of Patient Notes in the USMLE Step 2 Clinical Skills Exam

The Choice of Response Probability in Bookmark Standard Setting: An Experimental Study