Within the past several years, a great deal of discussion and some controversy have surrounded the issue of how test sponsors should set pass-fail standards for their examinations. Setting pass-fail levels remains a mix of science and art, a blend of empirical analyses and human judgment.
The NBME works closely with clients to assess their needs in relation to standard setting and to provide recommendations as appropriate. The staff assists clients in:
Our staff assists clients in identifying the types of data that will be useful in arriving at a sound standard setting decision. We assume responsibility for:
The NBME urges clients to periodically review the level of their examination standard as well as the process used to set that standard. The impact of the standard on failure rates over time can also be reviewed and assessed for its appropriateness. Our staff assists clients in identifying instances in which modifications in either the process or the standard itself may be warranted. Such a periodic review can help to assure the maintenance of appropriate standards.
The scoring and analysis of all examinations are performed under the direction of the psychometrician assigned to the examination program. NBME uses a proprietary, integrated software system that provides extensive capabilities for scoring and analyzing examinations and for conducting special statistical investigations.
Specifications for scoring and analysis are delineated through consultations with the client prior to test administration. As a result, scoring and analysis can begin promptly upon receipt of answer sheets or electronic response files.
The scoring system is sufficiently flexible so that ad hoc analyses can be conducted easily, thus assuring that all relevant examination data will be available as needed for review and decision making.
The NBME also maintains statistical packages to support a wide variety of examination analyses.
Numerical and descriptive data can also be displayed in multicolor graphics to facilitate interpretation and decision-making. The graphics packages offer a variety of formats, including both two and three dimensional graphs for the analysis of a variety of aspects of examination programs, including the following:
To assure the equivalence of examinations and pass-fail standards over time and across different test administrations, the NBME employs Item Response Theory (IRT) analyses as well as classical psychometric techniques.
In reviewing previously used items for a forthcoming test administration, the availability of item calibrations permits greater precision in selecting items with statistical characteristics similar to those found in prior administrations. The ability to link test items from multiple prior administrations also allows test sponsors to assure greater stability in pass-fail standards over time.
Many clients find it impossible to pretest their items in advance of test administration. Yet it is crucial that examinee scores be based upon test items of known and acceptable quality. When test item writers have been adequately trained and a careful developmental process is employed, newly developed test items can be used in live test administrations and their acceptability can be assessed statistically prior to final scoring.
This process of key validation occurs after the test has been administered but prior to final scoring. It involves a preliminary item analysis of all items so that items with unacceptable statistical characteristics can be eliminated prior to the calculation of examinee scores. As a result, the added costs of conventional pretesting can be avoided, and test sponsors and examinees can be assured that final scores will be based upon only those items that have survived careful statistical scrutiny.
From time to time, test sponsors are faced with a report from a proctor or examinee that alleges improper conduct on the part of one or more examinees. Such a report frequently involves suspected copying.
During the past several years, the NBME has developed policies and procedures that can be helpful to clients in investigating allegations of copying behavior. In addition, the NBME has also developed procedures that can be effective in preventing such behavior.
Statistical procedures used to investigate allegations of copying do not eliminate the necessity for careful review and judgment by responsible individuals. These procedures, however, can provide valuable information that facilitates the decision making process.
Consultation is available regarding the mechanisms that can be used to effectively investigate allegations of improper examinee behavior and to prevent such situations in future administrations of an examination.
The results of a test administration may be of interest to multiple individuals and agencies. Our staff discusses the client's requirements for reports and helps design communications which will meet the needs of various recipients, including the following:
A special feature of the NBME's reporting system is the optional Keyword Feedback Report which is used primarily to provide information to examinees regarding their test performance.
Keyword phrases are developed to describe the content and evaluation objective of each test item. These phrases are then organized so that they can be used to inform candidates of those items which were answered correctly or incorrectly.
When properly clustered, these keyword phrases can assist examinees in identifying their areas of strength and weakness, without disclosing actual test items.
A variation of this report can be produced to assist program directors in assessing the specific areas of strength and weakness of an entire group of examinees.
Following scoring and analysis of each test administration, a final report is prepared for the client. This report contains summary as well as detailed information regarding the following:
The final report may also detail the method used to set the pass-fail level and the failure rates by subsets of examinees.
Of particular importance is that section of the report which highlights issues or problems requiring attention and presents recommendations for the client's consideration.
This report provides an important ongoing documentation of the history of the examination and can be used by the client to monitor activities designed to enhance the quality of the examination program. It can also be a valuable resource in responding to challenges regarding the validity or fairness of the examination. Data reporting can be done using current state of the art methods, including CD-ROM.