Six Teams Recognized for NLP Advances in NBME’s Patient Note Scoring Competition

Posted July 25, 2022
  • Insights to aid Medical Education resources
  • Data from the competition on automated scoring of clinical text available upon request
  • Code is open source and available at

To propel research ahead for one of the biggest technological challenges in medical education, more than 1,400 teams of data scientists and machine learning enthusiasts from all over the world submitted proposed solutions as part of NBME’s three-month Kaggle competition. The competition, which began in May, resulted in six groups winning recognition for their developments toward automated scoring of clinical text.

In addition to the freely available code, NBME is providing interested researchers with the opportunity to apply for access to the de-identified data used as part of the competition for further research purposes (requests for this data are via NBME's Data Sharing portal).

The opportunity the participants tackled was “teaching” computers to recognize expressions of important clinical concepts (e.g., those from a scoring rubric) in examinee-written text.

The top winners released the code for their solutions under an open-source license, enabling medical educators to adapt it and use it in their own institutions (code from the winning teams and other participants can be found here).

As an example of the innovation, efforts, and dedication that went into developing these entries, the first-place team - Ryuichi & currypurin – shared that they spent a total of 900 hours working on the competition. Other teams reported applying similar resources, with some of them working during their free time and others being paid by their employers to compete and learn.

With hundreds of forum discussions and publicly shared code, the competition produced a substantial body of work on the state-of-the-art in automated clinical text scoring.

“We are grateful to all participants for their hard work and dedication in making the first collective effort towards solving an important problem in medical education assessment,” said Victoria Yaneva, Senior Data Scientist at NBME.

Medical education could gain a shift in resources through solutions such as the ones developed by the participants in the NBME Patient Note Scoring Competition. The insights provided by their work could significantly improve the process of patient note scoring (as well as the scoring of other types of clinical text) from manual rating to other tasks that need expert input.

The data available includes the history portions of nearly 44,000 patient notes from 10 clinical cases, where more than 2,800 patient notes (35k phrases) were annotated with concepts from the exam scoring rubrics.

An academic paper describing the dataset was recently published (see reference below), with NBME researchers presenting the topic at the annual meeting of the North American Chapter of the Association of Computational Linguistics (NAACL’22).

For questions related to the competition or the resources described here, please contact to Victoria Yaneva at

Yaneva, V., Mee, J., Ha, L., Harik, P., Jodoin, M., & Mechaber, A. (2022, July). The USMLE® Step 2 Clinical Skills Patient Note Corpus. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, Washington (pp. 2880-2886).

Back to News Archive