NBME logo
Date Updated: December 17, 2018

2019 Summer Internship in Assessment Science and Psychometrics

June 3 – July 28, Philadelphia, PA


The National Board of Medical Examiners (NBME) is an independent, not-for-profit organization that serves the public by developing, administering, and conducting research on high-quality assessments for healthcare professionals.

NBME programs include the United States Medical Licensing Examination®; an extensive offering of achievement tests for courses offered by medical schools; and numerous client examinations in medicine and other health professions. The variety of assessment programs creates a wealth of data and opportunities for applied and theoretical research that can impact practice.

The NBME employs approximately 30 doctoral level psychometricians and assessment scientists, as well as several MDs specializing in medical education. Staff is recognized internationally for its expertise in statistical analysis, psychometrics, and test development.

Interns will interact with other graduate students and NBME staff, and will present completed projects or work-in-progress to NBME staff. Internships typically result in conference presentations (e.g., NCME) and sometimes lead to publication or dissertation topics.


  • Active enrollment in doctoral program in measurement, statistics, cognitive science, medical education, or related field; completion of two or more years of graduate coursework.
  • Experience or coursework in one or more of the following: test development, IRT, CTT, statistics, research design, and cognitive science. Advanced knowledge of topics such as equating, generalizability theory, or Bayesian methodology is helpful. Skill in writing and presenting research. Working knowledge of statistical software (e.g., Winsteps, BILOG; SPSS, SAS, or R).
  • Interns will be assigned to one or more mentors, but must be able to work independently.
  • Must be authorized to work in the US for any employer. If selected, F-1 holders will need to apply for Curricular Practical Training authorization through their school’s international student office, and have a social security number for payroll purposes.


Total compensation for the two months is approximately $9800, and is intended to cover all major expenses (food, housing, travel).

Research Projects

Interns will help define a research problem; review related studies; conduct data analyses (real and/or simulated data); and write a summary report suitable for presentation. Projects are summarized below. Applicants should identify 1 to 2 projects by number that they prefer to work on.

  1. Multivariate Generalizability Theory and Score Profiles for Individuals and Institutions: This project involves evaluating the quality of subscores using multivariate G-theory (Brennan, 2001). There is particular interest in assessing the reliability of score profiles at the level of the school, classroom, and institution by integrating Brennan’s work on score profile reliability (Brennan, 2001, p. 323) with the work of Kane & Brennan (1977) on the generalizability of class means.
  2. Exploring the Impact of Examination Timing: High-stakes testing programs strive to provide the right amount of testing time – too much is wasteful and costly, too little impacts score validity. Possible projects will investigate exam timing, using real or simulated data, focusing on questions such as how test takers use time during the test, the differences between needing versus wanting more time, and metrics for monitoring speededness.
  3. Investigating the relationship between accuracy, speed and response pattern on USMLE using Rasch Poisson Count Model: The Rasch Poisson count model (Rasch, 1960/1980) has been used to assess accuracy data collected with competence tests using reading errors and other types of count data. Recently, computer-based testing led to the development of a vast array of modeling approaches that aim to assess whether response processes and latency data provide additional information about test takers beyond what the raw responses to items give. In this project, we explore the use of the hierarchical speed accuracy model (van der Linden, 2007) extended for modeling count data obtained from USMLE collected in an experimental setting. For example, there is an open question whether information about the optimal, rather than effective, speed and ability level of respondents can be obtained from tests given under time limits (e.g. Pohl & von Davier, 2018). One type of count data we plan to use is data on repeated visits of items by examinees. This may help to better understand how respondents structure their time during testing sessions. The relationship between working speed, ability, and the number of visits can be modeled using a three-dimensional model that conceptualizes them as multidimensional latent variable.
  4. Modeling response styles in non-cognitive assessments with extensions of hierarchical speed accuracy models for responses and response times: Extensions of response time models for joint statistical modeling of speed and accuracy have been proposed to overcome the limitations of existing approaches. In this summer internship study, data from non-cognitive scales, for example subjective well-being, communication, or personality scales will be analyzed using models extensions of the hierarchical speed accuracy model. The goal of the study is an exploration of how response styles often encountered in personality data can be better identified and controlled for by utilizing additional data such as response times that may be informative about inter individual differences in response processes.
  5. Using Natural Language Processing to Model Item Characteristics: This project involves applying state-of-the-art NLP techniques to predict item characteristics, such as item difficulty, examinee response time, and other factors that are important for test construction. Given the advent of automatic item generation and other similar processes, it is not feasible to pretest all items prior to their live administration, so this work is important for test fairness and security.
  6. The Utility of Various Decision Models Used for Residency Selection: This project investigates alternative selection models (e.g., compensatory; conjunctive; hybrid) that incorporate multiple measures of academic ability, non-cognitive skills, and demographic factors. Data will be simulated to mimic various predictor and criterion variables with known psychometric properties (e.g., correlations, reliabilities). The models will be evaluated in terms of achieving both performance related outcomes (e.g., specialty board exam scores) and socially beneficial outcomes (e.g., increased diversity).
  7. Physician Clinical Skills Performance Assessments: The intern will pursue research related to improving the precision and accuracy of a performance test involving physician interactions with standardized patients. Possible projects: design an enhanced process for flagging aberrant ratings by trained raters; support research on standardized patients in a high-stakes exam; evaluate difficulty of performance assessment items.


Send cover letter outlining experience and project interests by number, along with a current resume to Joanne Ver Ploeg. Application deadline is February 1, 2019.

All applicants will be notified of selection decisions by March 8, 2019.


NBME logo