Populations, Samples, & Sampling Error
Populations and Samples
A population is the entire group to which we want to generalize our results. A sample is a subset of the population that we actually study. For example, in our smiling research, the population might be all adult humans but our sample might be a group of 30 friends and relatives.
Parameters and Statistics
A numerical summary of a population is called a parameter, while the same numerical summary of a sample is called a statistic. So, for example, if we are interested in the population of adult Americans, then the mean height of adult Americans is a parameter. But the mean height of any sample or subset of adult Americans is a statistic.
Sampling error is the sample-to-sample variability in a statistic. For example, imagine that the mean IQ of CSUF students is 105.00. (If CSUF students are our population, then this mean is a parameter.) If we take a sample of 10 CSUF students and compute their mean IQ, it will probably not be exactly 105.00. Instead, let us say that it is 103.25. (This would be a statistic.) If we then take a second sample of 10 CSUF students and compute their mean IQ, again it will probably not be 105.00 … and it probably will not be 103.25 (the mean of our first sample). Instead, it might be 106.87. If we keep doing this—say, 100 times—then we will probably end up with 100 different sample means, and it is very likely that none of them is exactly 105.00. This variability in the sample means is sampling error. (Note that the term “error” here does not mean that anyone has made a mistake. “Error” here just refers to variability.)
Why Does Sampling Error Matter?
In general we do not know the population parameters that we are interested in. Instead, we have to draw conclusions about them based on sample statistics … usually from a single sample. We already know that such sample statistics are (probably) not going to match the corresponding population parameters exactly … so we have a problem. Here are examples of two important ways in which this problem turns up.
Example 1: A sample of 40 people is randomly assigned to rate the intelligence of a smiling person or a non-smiling person. (It is the same stimulus person with different expressions.) The mean for the smiling condition is 5.30 and the mean for the non-smiling condition is 4.90. But this does not necessarily mean that people in general (the population) would rate the person more intelligent in the smiling condition than in the non-smiling condition. The observed difference in the sample might be nothing more than sampling error. Maybe people in general would give a mean rating of 5.00 regardless of whether or not the stimulus person were smiling. Perhaps the 5.30 in the smiling condition reflects normal sample-to-sample variability, and perhaps the 4.90 in the non-smiling condition reflects the same. So maybe the difference is just happened to turn up for this sample. Maybe if the study were done again for a different sample, the means would come out the same, or the difference would be the other way around.
Example 2: Imagine that in a sample of 50 people, the correlation between the amount of allowance a person got as a child and how financially responsible he or she is as an adult is r = .20. This is a sample statistic and it indicates a small positive relationship … in the sample. We might like to conclude that this means that there is a small positive relationship between these variables in the population of adult Americans. But wait. What if the correlation in the population is actually zero. Perhaps this correlation of .20 in the sample just reflects normal sample-to-sample variability. Maybe if the study were done again with a different sample, the correlation would be close to zero, or maybe it would be negative.
Note that we are identifying a problem that many of you have identified already, albeit in a less formal way. How do we know that a small relationship is “real?” That is, how do we know that a small difference between group means or a small correlation in a sample represents a relationship that exists in the population, and is not just a “blip” that happened for this particular sample?
The Sampling Error Hypothesis and the Need for Inferential Statistics
In principle, any relationship that we observe in a sample could reflect nothing more than sampling error. In other words, this sampling error hypothesis is a possible explanation for any relationship that we observe in a sample. It could just be a random thing that happened for this particular sample, but it is not true in the population in general.
Inferential statistics—which we are starting to get into now—is mainly about testing the sampling error hypothesis. When we do inferential statistics, we ask whether it is reasonable to think that the relationship we have observed is just sampling error … or does it seem like it has to be something more? Usually, we would like to reject the sampling error hypothesis in favor of the hypothesis that the it is something more.
Let us return to the example above. Subjects in a smiling condition gave a mean intelligence rating of 5.30. Those in a non-smiling condition gave a mean intelligence rating of 4.90. We can use inferential statistics to decide whether that difference reflects a true difference in the population or whether it is just due to sampling error.