NBME offers summer internships for students who are actively enrolled in a doctoral program in measurement, statistics, cognitive science, medical education, or a related field.
Get Hands-on Experience and Mentorship from NBME Staff
NBME employs approximately 30 doctoral level psychometricians and assessment scientists, as well as several MDs specializing in medical education. Our team is recognized internationally for its expertise in statistical analysis, psychometrics, and test development.
As an intern, you will work directly with other graduate students and NBME staff. Internships typically result in conference presentations (e.g., NCME) and sometimes lead to publication or dissertation topics.
To receive consideration for an NBME internship, a candidate must meet the following requirements:
- Active enrollment in doctoral program in measurement, statistics, cognitive science, medical education, or related field; completion of two or more years of graduate coursework.
- Experience or coursework in one or more of the following: test development, IRT, CTT, statistics, research design, and cognitive science. Advanced knowledge of topics such as equating, generalizability theory, or Bayesian methodology is helpful. Skill in writing and presenting research. Working knowledge of statistical software (e.g., Winsteps, BILOG; SPSS, SAS, or R).
- Interns will be assigned to one or more mentors, but must be able to work independently.
- Must be authorized to work in the US for any employer. If selected, F-1 holders will need to apply for Curricular Practical Training authorization through their school’s international student office, and have a social security number for payroll purposes.
Dates & Compensation
The NBME 2020 Summer Internship runs from June 1 to July 24.
Total compensation for the two months is approximately $9800 and is intended to cover all major expenses, including food, housing, and travel.
Interns will help define a research problem, review related studies, conduct data analyses (real and/or simulated data), and write a summary report suitable for presentation. Click on each project below to learn more.
Applicants should identify 2 projects by number that they prefer to work on.
Application of Natural Language Processing (NLP) in the field of assessment has led to innovations and changes in how testing organizations design and score tests.
Possible projects will investigate novel NLP applications, using real or simulated data, for various processes relevant in an operational testing program (e.g., test construction, key validation, standard setting). Results would be informative for possible improvements to current best practices.
In this project, we explore the use of the Rasch Poisson Count model (Rasch, 1960/1980) to extend the hierarchical speed accuracy model (van der Linden, 2007) to model the item revisits and answer change behavior patterns in a high-stakes examination collected in an experimental setting.
We propose to connect the elements of process data available from a computer-based test (correctness, response time, number of revisits to an item, the outcome of the revisit to an item, IRT ability of examinee and IRT item characteristics) in a hierarchical latent trait model that explains examinee’s behavior on changing the initial response to the item.
The relationship between working speed, ability, and the number of visits and number of answer changes can be modeled using a multidimensional model that conceptualizes them as latent variables. The model should help us better understand the answer change behavior and cognitive behavior of examinees in a timed high-stakes examination.
The intern will pursue research related to improving the precision and accuracy of a performance test involving physician interactions with standardized patients.
Possible projects include designing an enhanced process for flagging aberrant ratings by trained raters and supporting research on standardized patients in a high-stakes exam.
This project will involve revising a commonly-used measurement instrument so that the appropriate inferences can be made with regard to medical students.
Duties will include the following:
- Working with subject-matter experts to revise the existing items
- Conducting think-alouds with medical students
- Developing a pilot measure of potential items
- Exploratory and confirmatory factor analysis of initial pilot results to gather structural validity evidence
- Developing a larger survey to gather concurrent and discriminate validity evidence with the revised measure
- Administration and evaluation of the larger survey
The health of an item pool can be defined in a number of ways. Our current test development practices utilize have/need reports broken down by content area, and many content outlines are hierarchical in nature, with several layers of content coding and metadata. The problem is that the have/need ratios are, for the most part, one dimensional, but details within the “have” portion of these ratios represent multidimensional information that can be used to improve multiple aspects of test development, including form construction, test security, pool management/maintenance, and targeting of item-writing assignments.
The aims of this project are two-fold:
- Develop helpful, easily-interpretable metrics to assess item pool health
- Employ a sophisticated visualization method of item pool health (e.g., via R Shiny, D3.js, .NET languages/libraries, etc) to assist in improving one or more aspects of test development.
Test content outlines and specifications often change rapidly within cutting-edge domains. In response to these changes, test development teams must “map” the pre-existing content onto the new content domains. Such a task is trivial when there are equivalent content domains between the new and old content outlines. However, this direct mapping rarely occurs, leaving item mapping to be done manually, a time-intensive task that is prone to human error and differences in subjective interpretations across humans.
This project seeks to utilize and integrate natural language processing (NLP), machine learning (ML), and data visualization to:
- Assist subject-matter experts with creating new content outlines
- Help map items to new content domains
- Review manual item mappings for accuracy as a quality control measure
- Visually represent the degree of content distribution within a group of items (e.g., test form, item bank, etc).
A component of this project will be the utilization of sophisticated data visualization methods to allow subject matter experts and test development staff to more easily examine items in multiple contexts. Strong candidates for this position will have knowledge of Python or a similar language to utilize common libraries used in NLP (e.g., Keras, Tensorflow, Pytorch, etc)
Recently the NBME has developed a computer-assisted scoring program that utilizes natural language processing (NLP). The two main components of the program are:
- Ensuring that the information in the constructed response is correctly identified and represented
- Building a scoring model based on the these concept representations.
Current areas of research surrounding this project include (but are not limited to):
- Refining quality control steps to be taken prior to an item being used in computer-assisted scoring
- Linking and equating computer-assisted scores with human rater scores
- Evaluating a scoring method based on using orthogonal arrays
- Developing metrics that assess item quality and test reliability when computer-assisted scores and human scores are used to make classification decisions.
The final project will be determined based on a combination of intern interest and project importance.
A cover letter outlining experience and listing project interests by number, along with a current resume, are required. Application deadline is February 3, 2020 and all applicants will be notified of selection decisions by February 21, 2020.