Crafting and Refining: The Importance of Psychometric Analysis in Enhancing Assessments

Tommy Sangchompuphen

Aug 16, 2024

As the new semester kicks off, we're all focused on how best to assess our students. Crafting thoughtful and well-intentioned multiple-choice questions is just the beginning. The real work begins after the exam when we analyze the data to ensure our assessments are fair and truly effective in measuring student learning.

This approach aligns with my teaching philosophy, which emphasizes a data-driven, assessment-rich, and personalized, hands-on approach. I’m reminded of this every day by my ExamSoft mousepad titled "Psychometrics 101," which highlights key metrics like Point Biserial, Discrimination Index, and KR-20. These metrics aren’t just numbers—they provide critical insights that can help us fine-tune our assessments. Even the best questions can benefit from careful analysis and, when necessary, a bit of tweaking.

Here’s a reminder about what these metrics mean and how they can be applied in practice:

Point Biserial: The Point Biserial is a correlation coefficient that measures the relationship between students' performance on a single test item and their overall performance on the exam.

What It Means: This metric ranges from -1.00 to +1.00. A positive Point Biserial indicates that students who performed well on the exam also answered the item correctly, suggesting the item aligns well with the overall exam's objectives. Conversely, a negative Point Biserial suggests that high-performing students were less likely to answer the item correctly, indicating a potential issue with the question.

Example: Imagine you have a multiple-choice question that the majority of top-performing students answered incorrectly. If the Point Biserial for this item is -0.30, it signals that the question might be misleading or not aligned with the material tested. In this case, you might need to review the question to see if it's ambiguous or if it covers content not adequately taught.

Discrimination Index: The Discrimination Index measures the difference in how well the top and bottom performers on an exam answer a specific question.

What It Means: The index ranges from -1.00 to +1.00. A positive Discrimination Index indicates that the question effectively distinguishes between students who understand the material well and those who do not. A negative index suggests that lower-performing students answered the question correctly more often than higher-performing students, which is generally undesirable.

Example: Suppose a question has a Discrimination Index of +0.60. This would mean that students in the top 27% of the class answered it correctly more often than those in the bottom 27%, demonstrating the question's effectiveness in differentiating between varying levels of student understanding.

Difficulty Index: The Difficulty Index, also known as the p-value, represents the proportion of students who answered an item correctly.

What It Means: This index ranges from 0.00 to 1.00. A higher Difficulty Index indicates that a larger proportion of students answered the item correctly (and thus, the item is easier), while a lower index suggests that fewer students answered it correctly (indicating it’s more difficult).

Example: If a question has a Difficulty Index of 0.90, it means that 90% of students answered the question correctly, indicating that the question was relatively easy. Conversely, if a question has a Difficulty Index of 0.30, it means that only 30% of students got it right, making it a challenging question. Depending on your assessment goals, both high and low difficulty questions can be useful, but they should be balanced to appropriately challenge students.

KR-20: The Kuder-Richardson Formula 20 (KR-20) is a measure of the internal consistency or reliability of an entire exam.

What It Means: KR-20 ranges from 0.00 to 1.00, with higher values indicating greater reliability. A high KR-20 suggests that the exam consistently measures what it’s supposed to, with questions that are cohesive and aligned with the overall exam goals. It’s important to note that this metric is dependent on the number of questions and the variability among test-takers.

Example: If your exam has a KR-20 value of 0.85, it indicates that the exam is highly reliable, meaning that students’ scores are likely to be a true reflection of their knowledge and understanding of the material. A lower KR-20 value, such as 0.60, would suggest the need for improvement, perhaps by revising or replacing questions that don’t align well with the rest of the exam.

Being a good professor isn’t just about drafting good questions; it’s also about rigorously evaluating those questions to ensure they accomplish what we intend—accurately assessing our students' understanding of the material. This ongoing process of reflection and improvement is essential to providing a meaningful educational experience that benefits both our students and our teaching.