Sensory analysis
|
|
Sensory analysis is the science of determining the attributes of products using the human senses, with the expert wine taster as a familiar example. |
Though chemistry, physics or microbiology, say, can tell us about some ways that products differ, other important attributes such as pleasantness or the similarity of one odour to another, can be measured only by using human assessors. There are well developed methods for characterizing products in this way, but Difference Testing is a special branch of sensory analysis, addressing a particular kind of problem.
Difference testing
Difference testing is concerned with questions such as whether or not two batches of a product are noticeably different. Probably no two batches are ever completely identical, given sufficiently sensitive analytical methods, but even real differences may not be noticed by consumers. Here are a few examples.
A company knows that the raw materials recently delivered for their food product are in some respects different from usual. Is the consequent change in the finished product perceptible to consumers?
The Development Section have come up with a new recipe that has production advantages. Do customers notice the difference?
After how much storage time has the product changed enough to be noticeably different?
Many types of difference test have been devised. Those in widespread use are described in detail in textbooks of sensory methods and in national and international standards. All take the form of requiring someone called 'an assessor' or, more usually, a panel of several assessors, to carry out a task which can be performed perfectly by an assessor who detects the difference with certainty but will frequently result in errors if the assessor is unsure about the difference.
Results from difference testing are rarely clear-cut. Usually, they are statistical in nature, which means that answers always have some uncertainty. The aim of statistical analysis of difference tests is to obtain answers that are as precise as the data allow and whose uncertainty is quantified.
One tool for quantifying the uncertainty of a difference test is a test of statistical significance. This calculates a numerical probability, the significance level of the result. This is the probability of obtaining results showing at least as much detectability as was actually observed if, in truth, the difference is completely undetectable. This probability can be calculated, since if the difference is totally undetectable, answers must be given at random and the probability of being correct on any one trial can be deduced from the nature of the test.
Statistical significance
Suppose a difference test required each of 12 assessors to select the sweetest sample from a different set of three. In each set, two samples were identical and one had more sweetener in it. Of course various precautions had to be taken to ensure that no difference other than sweetness could be used to select the odd sample. On each trial (each presentation of three samples) the assessor was asked to select one of the three as sweetest even if the choice had to be a guess. This procedure is known as 3-alternative forced choice, or 3-AFC. (Note that this procedure is different from the triangle test which also involves the selection of one out of three samples.)
If the difference is truly undetectable, the assessor has to choose with no help from the samples. All three are therefore equally likely to be chosen so the probability of choosing the right one is one in three (0.333). The probability of being correct on a single trial is written p(C).
|
|
|
Now one of these outcomes must occur. That is, out of the 12 assessors, the number giving a correct answer must be one of 0, 1, 2 ... 11 or 12. Therefore, the probabilities of these 13 possible events must add up to 1. So, the total blue area in the graph is 1.0
The probability of 10, 11 or all 12 of the assessors making correct choices is tiny (less than 1 in 1000) if they are choosing without detecting any difference in sweetness. The probability of 9 being correct is also small (about 1 in 250) so if we find that 9 or more out of the panel of 12 make correct choices, we feel pretty confident that they had some assistance from the samples and since we have taken care that the only difference between samples is in the amount of sweetener, we interpret that as evidence that they detected a difference in sweetness.
|
|
|
If we had tested 12 assessors in this way and found that 8 or more had given correct answers, we would conclude that the evidence indicated that at least some of them were detecting a difference in sweetness. We could say that we have found a statistically significant difference between the results of the test and the predictions made by assuming that no difference was detectable. More briefly, we often say that we have found a significant difference in sweetness between the samples.
|
|
|
If we find that 7 out of 12 assessors give correct answers in this test, we very emphatically do not conclude that they were unable to detect any difference in sweetness. The proper conclusion when a result is not significant is that the data do not allow you to conclude with the confidence you want that the assessors did detect a difference in sweetness. This is quite a different conclusion. In fact, if we get 7 correct answers out of 12, this is quite a lot more than the number to expect if the assessors were always just guessing. So far as it goes, a result of 7 correct out of 12 does suggest that some of them detected a difference. It is just that with the amount of data gathered (12 trials) the chance of 7 or more successful guesses is not small enough to meet the significance standard that convention demands.
If the same success rate is maintained when we carry out the test with a lot more assessors, the result will be significant. For instance, if we use 24 assessors and obtain 14 correct answers (the same success rate) the result is significant at the 0.01 level. In other words, the probability of getting 14 or more correct in 24 random guesses is less than one in a hundred.
For this reason, a test of significance can be used to draw a fairly confident conclusion about a difference being detectable, if the results are appropriate, but it cannot tell us that a difference is undetectable whatever the results are. If we are interested in Similarity Testing seeking reassurance that a difference is undetectable we need a different approach.
Please send comments or suggestions about this page to: