Showing 1 - 10 of 15 Research Library Publications
Posted: | Victoria Yaneva (editor), Matthias von Davier (editor)

Advancing Natural Language Processing in Educational Assessment

 

This book examines the use of natural language technology in educational testing, measurement, and assessment. Recent developments in natural language processing (NLP) have enabled large-scale educational applications, though scholars and professionals may lack a shared understanding of the strengths and limitations of NLP in assessment as well as the challenges that testing organizations face in implementation. This first-of-its-kind book provides evidence-based practices for the use of NLP-based approaches to automated text and speech scoring, language proficiency assessment, technology-assisted item generation, gamification, learner feedback, and beyond.

Posted: | Victoria Yaneva, Peter Baldwin, Le An Ha, Christopher Runyon

Advancing Natural Language Processing in Educational Assessment: Pages 167-182

 

This chapter discusses the evolution of natural language processing (NLP) approaches to text representation and how different ways of representing text can be utilized for a relatively understudied task in educational assessment – that of predicting item characteristics from item text.

Posted: | Polina Harik, Janet Mee, Christopher Runyon, Brian E. Clauser

Advancing Natural Language Processing in Educational Assessment: Pages 58-73

 

This chapter describes INCITE, an NLP-based system for scoring free-text responses. It emphasizes the importance of context and the system’s intended use and explains how each component of the system contributed to its accuracy.

Posted: | Christopher Runyon, Polina Harik, Michael Barone

Diagnosis: Volume 10, Issue 1, Pages 54-60

 

This op-ed discusses the advantages of leveraging natural language processing (NLP) in the assessment of clinical reasoning. It also provides an overview of INCITE, the Intelligent Clinical Text Evaluator, a scalable NLP-based computer-assisted scoring system that was developed to measure clinical reasoning ability as assessed in the written documentation portion of the now-discontinued USMLE Step 2 Clinical Skills examination. 

Posted: | Erfan Khalaji, Sukru Eraslan, Yeliz Yesilada, Victoria Yaneva

Behavior & Information Technology

 

This study builds upon prior work in this area that focused on developing a machine-learning classifier trained on gaze data from web-related tasks to detect ASD in adults. Using the same data, we show that a new data pre-processing approach, combined with an exploration of the performance of different classification algorithms, leads to an increased classification accuracy compared to prior work.

Posted: | Thai Q. Ong, Dena A. Pastor

Applied Psychological Measurement: Volume 46, issue 2, page(s) 571-588

 

This study evaluates the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item.

Posted: | Victoria Yaneva, Janet Mee, Le Ha, Polina Harik, Michael Jodoin, Alex Mechaber

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - p 2880–2886

 

This paper presents a corpus of 43,985 clinical patient notes (PNs) written by 35,156 examinees during the high-stakes USMLE® Step 2 Clinical Skills examination.

Posted: | Chunyan Liu, Daniel Jurich

Applied Psychological Measurement: Volume 46, issue 6, page(s) 529-547

 

The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria.

Posted: | Peter Baldwin, Brian E. Clauser

Journal of Educational Measurement: Volume 59, Issue 2, Pages 140-160

 

A conceptual framework for thinking about the problem of score comparability is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.

Posted: | Ian Micir, Kimberly Swygert, Jean D'Angelo

Journal of Applied Technology: Volume 23 - Special Issue 1 - Pages 30-40

 

The interpretations of test scores in secure, high-stakes environments are dependent on several assumptions, one of which is that examinee responses to items are independent and no enemy items are included on the same forms. This paper documents the development and implementation of a C#-based application that uses Natural Language Processing (NLP) and Machine Learning (ML) techniques to produce prioritized predictions of item enemy statuses within a large item bank.