Confidence bounds
Suppose
we are going to carry out a difference test with 12 assessors and
count how many correct answers are given.
If
correct answers are likely, we expect to get a large number of
correct answers maybe 10, 11 or 12. If correct answers are
unlikely, we expect few correct answers perhaps 0, 1 or 2.
|
|
|
Now suppose you have carried out 12 trials and found that 7 of the answers were correct. For some probabilities of an individual making a correct answer, this result is quite a likely one. For others, it is a very unlikely outcome. For instance, if the probability of giving a correct answer is 1/100 for every assessor, the probability of getting any correct answers from 12 trials is small and the probability of getting 7 or more correct answers is very tiny indeed. But if the probability of a correct answer on one trial is 1/2, 7 correct answers out of 12 is a fairly likely outcome. Somewhere between, 1/100 and 1/2 there will be some probability, p(C), that is just small enough to make the probability of 7 or more correct answers exactly 0.025. This probability is the Lower 95% bound for the estimated probability of giving correct answers.
|
|
|
So, if we have actually obtained 7 correct answers out of 12, we have a result that is very unlikely if the probability of correct answers is 0.277 (and the result is even more surprising if it is anything less than that).
Thus, observing 7 correct answers makes us inclined to disbelieve that the probability of being correct is any lower than 0.277.
Seven is also an unlikely number of correct answers if their probability is very high. For instance, if the probability of a correct answer is 0.9, the probability of getting only 7 or fewer correct answers is only about four in a thousand (0.004).
A similar argument to the one for the lower bound lets us put an Upper bound on the estimated probability of correct answers. There will be some probability of being correct which is sufficiently high to make 7 or fewer correct answers unlikely enough to make us disbelieve that the true probability is any higher than that. That is, there will be some value between 0.5 and 0.9 for the probability of giving correct answers that makes the chance of getting 7 or fewer exactly 0.025.
|
|
|
So, if we have actually obtained 7 correct answers out of 12, we have a result that is very unlikely if the probability of correct answers is 0.848 (and the result is even less likely if it is any higher).
Confidence intervals
We now have a range of possible estimates of the probability of a correct answer that are not so extreme as to be unbelievable. This range runs from 0.277 to 0.848. We reject values lower than 0.277 or higher than 0.848 because all values outside that range require us to believe that something very unlikely has happened. But unlikely things can happen so we may have been wrong to reject them. The two criterion probabilities of 0.025 add to 0.05 or 5%. For this reason, the range of believable values is usually referred to as the 95% Confidence Interval.
If we want to be more confident that the range we have calculated includes the true probability of correct answers being given, we can set a more stringent criterion, such as 0.005. Doing so gives us a 99% confidence interval, which is wider than the 95% confidence interval, but by including additional possible values gives us more confidence that the correct answer lies within it.
Conversely, we could relax the criterion, say to 0.05, giving a 90% confidence interval, which will cover a narrower range of probabilities for correct answers but with less assurance that the true answer is actually in the interval.
A confidence interval has some similarities to the sort of significance test that might routinely be used to decide if the number of correct answers is great enough to conclude that something other than guessing is required to account for them. They have features in common but they are not exactly equivalent.
A significance test begins from the Null Hypothesis that answers are given at random and calculates the chance of getting so many correct if that is so. The confidence bounds are calculated by discovering what probability of correct answers would make the observed result just significant. Their relationship is shown by the following graphs.
|
|
|
|
|
|
Please send comments or suggestions about this page to: