On a recent site visit to a client who was applying for accreditation of a safety training program, the issue of testing became front and center. The organization was using a 20 item multiple choice test to determine if learners had sufficient knowledge to safely handle hazardous chemicals on the job. As an Assessor, I asked the obvious questions: How did they know that the test was measuring key safety knowledge and how did they decide that a 70% score on the test demonstrated sufficient competency? After some hemming and hawing, the client contended that experts in the subject had written the test and that knowing 70% or more of the content was adequate evidence of knowledge acquisition. I pressed them to provide more details to support those claims.
The underlying issue here is validity – the extent to which a test really measures what it purports to measure. Validity can be established in a number of ways. The most common is face validity, the type this client relied on in constructing its safety exam. Subject matter experts review the exam and decide, on the face of it, if it looks like a valid measure of the subject matter. While this is relatively easy to do, it provides the least valuable evidence of validity. A stronger form is content validity, in which the SMEs not only look at the test, but also at the learning objectives and domain of knowledge included in the test to determine that there is a good match. This type of validity is fairly easy to establish and numerous U.S. courts have upheld it as a legal way to guarantee that tests are measuring important job-related skills. For those who wish to go even further in pursuit of test validity, criterion and construct validity offers the opportunity to prove the test is valid using real test takers and test data. Criterion validity establishes a meaningful relationship between a test and another known and valid measure, such as job performance. Construct validity shows empirical evidence that a test can clearly distinguish between knowledgable experts and untrained novices, thus demonstrating measurement of the underlying psychological construct.
The other issue related to validity is setting a passing score (or cut score) to determine minimum proficiency. Simply adopting 70% or ‘C’ as the passing score, just because that’s what it was back in school, is insufficient. Instead, the passing score should be set by SMEs who determine the minimum level of knowledge and skill required on the job. One common method is the Angoff method, named after Dr. Angoff, a researcher at the Educational Testing Service who invented it. A panel of experts reviews the test item by item and considering the learning objective and the audience for the test, determines the probability that a minimally qualified candidate would answer the question correctly. These expert opinions are then averaged across all reviewers and all test items to come up with an overall passing score. When clients use this method, they often end up raising or lowering the passing score from 70% based on the perceived difficulty of the test and the demands of the job. As training professionals use more tests to measure the impact of our work, we need to become more knowledgable about things like validity and cut scores in order to make sure tests are used appropriately in the workplace.