Applied Psychological Measurement: Volume 47, issue 1, page(s) 34-47
This study used simulation to investigate the performance of the t-test method in detecting outliers and compared its performance with other outlier detection methods, including the logit difference method with 0.5 and 0.3 as the cutoff values and the robust z statistic with 2.7 as the cutoff value.
Teaching and Learning in Medicine: Volume 33 - Issue 4 - p 366-381
The purpose of this analysis is to describe these sources of evidence that can be used to evaluate the quality of generated items. The important role of medical expertise in the development and evaluation of the generated items is highlighted as a crucial requirement for producing validation evidence.
Academic Medicine: Volume 97 - Issue 8 - Pages 1219-1225
Since 2012, the United States Medical Licensing Examination (USMLE) has maintained a policy of ≤ 6 attempts on any examination component. The purpose of this study was to empirically examine the appropriateness of existing USMLE retake policy.
Applied Psychological Measurement: Volume 46, issue 2, page(s) 571-588
This study evaluates the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item.
Applied Psychological Measurement: Volume 46, issue 6, page(s) 529-547
The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria.
Academic Medicine: June 2022
This study examines the associations between Step 3 scores and subsequent receipt of disciplinary action taken by state medical boards for problematic behavior in practice. It analyzes Step 3 total, Step 3 computer-based case simulation (CCS), and Step 3multiple-choice question (MCQ) scores.
Journal of Graduate Medical Education: Volume 14, Issue 3, Pages 353-354
Letter to the editor.
Journal of Educational Measurement: Volume 59, Issue 2, Pages 140-160
A conceptual framework for thinking about the problem of score comparability is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.
Advances in Health Sciences Education: Volume 27, p 1401–1422
After collecting eye-tracking data from 26 students responding to clinical MCQs, analysis is performed by providing 119 eye-tracking features as input for a machine-learning model aiming to classify correct and incorrect responses. The predictive power of various combinations of features within the model is evaluated to understand how different feature interactions contribute to the predictions.
JMIR Medical Education: Volume 8 - Issue 2 - e30988
This article aims to compare the reliability of two assessment groups (crowdsourced laypeople and patient advocates) in rating physician error disclosure communication skills using the Video-Based Communication Assessment app.
Pagination
- First page
- Previous page
- 1
- 2
- 3
- 4
- 5
- 6
- …
- Next page
- Last page