| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |
| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |
| {{ 'ml-lesson-time-estimation' | message }} |
Sampling is the process of selecting a sample from a population of objects or individuals. The sample can be then examined to make conclusions about the entire population.
There are many different sampling methods, such as convenience sampling, self-selected sampling, systematic sampling, and random sampling. For example, suppose that a sample of the students in a classroom needs to be chosen to study blood sugar levels. The following applet visualizes some of these methods.A sample that accurately reflects the characteristics of the population is called a representative sample.
In a representative sample, if $x$ parts of the population have a certain characteristic, approximately $x$ parts of the sample will share the same characteristic. As an example, consider a sample of cats and dogs selected from all the cats and dogs at a pet shop.
The figure shows cats represent $60%$ of the population and $60%$ of the sample. Similarly, dogs represent $40%$ of the population and $40%$ of the sample. In this case, the sample is then representative. Note that in a representative sample, every subgroup of the population is represented.
It is possible that samples reflect biases. In most cases, unbiased samples result in representative samples. On the other hand, biased samples usually result in samples that are not representative of the population.
A bias in a sample is an error in sampling that results in misrepresentation of members of a population. Bias occurs when members of a population that are representing certain characteristics are more likely to be selected in a sample than others.
Definition | |
---|---|
Unbiased Sample | A sample that is representative of the population. Conclusions drawn from this sample can be generalized to the whole population. |
Biased Sample | A sample that overrepresents or underrepresents a certain part of the population. The inferences drawn based on this sample may be invalid. |
The chosen sampling method may either introduce or minimize a bias in a sample. The following real-life scenarios present examples of biased samples.
Biased Sample | Explanation |
---|---|
A city council asks residents whether there should be an off-leash area for dogs in a park. A hundred dog owners are surveyed at the park. | The only people asked are dog owners. This means that respondents are more likely to have a strong opinion about an off-leash area for their dogs. |
To assess the experiences of customers who shop online, a company e-mails purchasers with a link to a survey. | Because this sample is self-selected, only those who are very satisfied or dissatisfied with the shopping experience are likely to respond. |
Every sixth boxer at a boxing camp is asked to name their favorite brand of boxing gloves. | Not all boxers go to a boxing camp, as camps are usually sponsored by a brand and take place in a single city. Also, professional boxers often organize their own private camps with hired sparring partners. |
A simple random sample is an unbiased sample because it involves selecting members from a population randomly. This guarantees that each member of the population has an equal chance of being included in the sample. This type of sample is usually representative because it tends to have the same characteristics as the population.
Systematic samples can be either biased or unbiased.
It is important to remember that unbiased samples are better suited to generate a representative sample.
A self-selected sample is a biased sample because people with strong opinions, either positive or negative, about the topic studied are more likely to volunteer. Also, people who are interested in the topic being studied may be more likely to participate, while those who are not interested may refuse to participate.
As a result, such a sample is not representative of the population because it underrepresents people with neutral opinions about the topic or who are not interested in it.
This type of sampling is often used when researchers have limited resources or are under certain time constraints.
Convenience sampling can lead to a biased sample since the researchers choose sample members that are easily available to them. Some groups of people in the population may not be represented in the sample because they are not easily accessible, causing the sample to not be representative of the population.
Stratified sampling guarantees that each subgroup of a population is represented in the sample, which means that the stratified sample is a representative sample. Therefore, a stratified sample might be considered an unbiased sample.
Cluster sampling is prone to bias. When the clusters are not representative of the characteristics of a population, the conclusions about the entire population would be biased as well.