| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |
| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |
| {{ 'ml-lesson-time-estimation' | message }} |
Here are some recommended readings, split into categories, that help prepare for this lesson.
Introduction to Population and Samples
Understanding Sampling Methods
Ali and Davontay, who are classmates and good friends, are talking about video games and their studies. Ali claims that video games can improve students' GPA performance in school. Davontay completely disagrees with that claim.
They wonder what they would need to perform an experiment at their school to find out whose claim is true. They should be able to design a successful experiment by finding the following information. Give them a hand!
Suppose a hamburger restaurant wants to measure customer satisfaction. Should they survey every single client? Or, instead, would it be better to survey a sample of the population? To help figure out the answer to these important questions, the definitions of parameter and statistic will come to the rescue!
Populations are analyzed with the purpose of identifying descriptive measures that will help to answer questions of interest, which may vary. The measures obtained from a population, like the population mean, the population proportion, and the population standard deviation, are called parameters.
A parameter is a value that describes some measurable characteristic of a population.
Collecting data of every member of a population is typically expensive and time-consuming. For that reason, samples of the population are used. The measures obtained from a sample, such as the sample mean, the sample proportion, and the sample standard deviation, are called statistics.
A statistic is a value that describes some measurable characteristic of a sample.
Since a statistic is a value that describes a sample, it can change from sample to sample. The following steps ensure that the sample statistics can then be used to make inferences about the population parameters.
The diagram illustrates this process.
They found some really cool studies but struggled to determine whether the descriptive numbers used in them were parameters or statistics. Help them identify which of the two each study represents.
In fact, most of the results are around 0.24. This means that the result of landing on red 14 times can easily occur by chance and it is most likely that the spinner does not favor red.
This means that this result is highly improbable to occur by chance when playing with a fair spinner. Therefore, with the spinner that Ali brought, to be able to land on red 27 times is so improbable that the spinner is most likely not fair at all, and it favors red!
Once a claim is identified and formulated as a statistical hypothesis, a statistical study can be conducted to analyze the hypothesis. It is important that the most appropriate data collection method for the study is determined.
A statistical study is a process in which data are analyzed to find answers to questions about a population parameter. The process includes determining the purpose of the study, selecting a representative sample of the population using sampling methods, calculating statistics, and making inferences about the population.
The reason for using samples is that it is almost impossible to collect data from every member of the population due to constraints such as time and money. The findings of a sample can be used to make inferences about the population as long as appropriate sampling procedures are used.
A poorly chosen sample may result in biased results and misinterpretations about the population. This can be avoided by using random sampling to ensure that data from the sample is typical of the population. Data collection can be done using methods such as survey, experiment, observational study. The best way to collect data can be determined based on the study's purpose.An experiment or controlled experiment is a data collection method used in statistical research to measure the effect of some treatment. It divides a sample into two groups that are kept under the same conditions. One group — the treatment group — receives some type of treatment, while the other — the control group — does not receive any treatment.
Experiments are used in statistical studies to compare the effect on the treatment group to the control group. For example, suppose a medical center wants to test whether a new vaccine helps defeat breast cancer in women. To conduct a controlled experiment, researchers need to sample women with breast cancer and randomly assign them to groups.
Control Group | Treatment Group |
---|---|
Women with breast cancer that will not receive the vaccine. | Women with breast cancer that will receive the vaccine. |
In an observational study, the researcher observes and finds measures about a chosen sample — doing so without controlling its environment, nor interacting with or exposing it to any treatment. This method is used when it is impossible to isolate the variable under study or when individuals cannot be exposed to treatments due to ethical concerns.
An observational study aims to observe the effect of a risk factor, a treatment, or other intervention. The study's criteria needs to be specified in advance to find characteristics, behaviors, or specific events in the observations. Criteria, such as a list of things to look for, ensure consistency in the process of making observations.
It is crucial to acknowledge that other variables might affect a study. Suppose that a diet program's effect on a person's blood pressure is being investigated. It must be accounted for that there are various factors which also affect blood pressure such as a person's stress levels or exercise routine. Consider the following cases of advantages of observational studies.
Advantages | Example |
---|---|
Measures are taken in real and naturally occurring scenarios. | Observing lions in their environment to learn about roles, eating habits, and social and family structures. |
It is helpful for complex situations, especially those where imposing treatment on individuals could be unethical. | Compare the sugar levels of 50 people who drink coffee and 50 who do not without influencing this decision. |
Observational studies also have disadvantages to be considered.
Disadvantages | Example |
---|---|
Risk of ignoring other factors that can affect the variables of the study. | A study might conclude that people who meditate have lower cholesterol levels than others, but it might ignore that they exercise more and follow a healthier diet than others. |
It can be time-consuming, mainly if the observed variables do not occur frequently or last for long periods. | Studying the lifestyle of elephants in their birth and pregnancy stages. Their pregnancy can last around two years and they give birth around every four years. |
A statistical study that collects data from participants through a questionnaire — a series of questions — is called a survey.The answers obtained in a survey aim to collect data about the characteristics, behaviors, or opinions of the members of the population.
Questions must be clear to avoid confusion in the participant. The participants answer the questions while the researcher is sure to never interfere. Here are a few common types of surveys.
Davontay and Ali enjoyed their time learning about study design so much that they immediately ran to the local science museum to keep learning. There, Davontay discovered a cool magazine called Interesting Facts You Have Never Thought About.
In an experiment, the sample is divided into two groups: a treatment and a control group. In an observational study, members of a sample are observed without influencing the members. In a survey, individuals are asked a series of questions.
The main characteristics of each situation will be determined one by one.
The table summarizes what was determined.
Situation | Method Used for Collecting Data |
---|---|
A | Experiment |
B | Observational Study |
C | Survey |
Ali and Davontay are on a hot streak classifying statistical studies. In the magazine from the science museum, it not only includes exciting past studies, but it has a future research section that poses real-life situations in need of study.
Future Research Problems | |
---|---|
A | The way people consume movies is changing rapidly. The movie industry wants to study what viewing options people would prefer to watch movies over the next decade. |
B | More households have begun gardening tomatoes than ever before. A company that makes garden soil wants to test whether their newly developed fertilizer improves the growth of tomatoes. |
C | Before building a new jungle gym, a local park wants to count how many toddlers are using the current jungle gym. The design of the jungle gym will depend on this finding. |
The given situations will be analyzed to determine which method of collecting data fits.
In situation B, a company wants to test the impact a fertilizer has on the growth of tomatoes. To observe the fertilzer's effects, a group of sample tomato plants must be exposed to the fertilizer. That means a treatment group and a control group will be needed.
Treatment Group | Control Group |
---|---|
A group of plants that will be exposed to the fertilizer | A group of plants that will not be exposed to the fertilizer |
The table summarizes the information.
Situation | Appropriate Method for Collecting Data |
---|---|
A | Survey |
B | Experiment |
C | Observational Study |
In a survey, data is collected by participants answering a series of questions. Unfortunately, poorly designed surveys can negatively affect the results of a study. For example, leading questions influence participants' answers so that their original views are not reflected. This kind of influence is called bias in survey questions. A variety of factors contribute to bias.
A double-barreled question causes the reader to answer two questions with the same single answer, thereby disallowing for the necessary nuance to answer both questions separately. To avoid this issue, verify the question and, if needed, split it into two or more questions.
Biased Question | Unbiased Question |
---|---|
How likely are you to recommend medieval fantasy books and movies ? | How likely are you to recommend medieval fantasy books? |
How likely are you to recommend medieval fantasy movies? | |
How much do you enjoy playing video games and watching movies? | How much do you enjoy playing video games? |
How much do you enjoy watching movies? |
This bias happens when questions that are related are not presented in logical order. For example, questions asked early in a survey can influence how participants answer questions later in the survey.
Biased Order |
---|
1. What is your least favorite math subject? |
2. How much do you enjoy math? |
In this case, asking about the participant's least favorite math subject will have a strong influence on how they respond if later asked about their feelings towards math in general. Using the inverse order will produce a fairer response.
Unbiased Order |
---|
1. How much do you enjoy math? |
2. What is your least favorite math subject? |
Depending on the situation, randomization can also be applied to avoid this bias.
A leading question leads respondents to answer in a specific way desired by the surveyor. Leading questions can be written either intentionally or unintentionally. To avoid this bias, ask questions clearly and do not use words that encourage specific answers.
Biased Question | Unbiased Question |
---|---|
Do you always consume fast food? | How often do you consume fast food? |
How healthy are you? | How would you rate your health? |
All of this studying is making Ali and Davontay hungry. While eating, they observe students' behaviors; some eat the cafeteria's food, and others bring food from home. Feeling inspired by their new research skills, they want to study this behavior more in-depth.
They plan to conduct a survey and design a series of questions to ask the students. Identify which of the following survey questions are biased?Identify if there is any word that can lead to a specific answer. Check if the questions ask only one thing at a time.
Each question will be analyzed to determine if they are biased or not.
All options have been analyzed. The table summarizes the results.
Option | Biased? |
---|---|
How good are the cafeteria facilities? | Yes |
Do you bring your own lunch or buy it at the school's cafeteria? | No |
How do you consider the taste and appearance of the cafeteria's food? | Yes |
Do you prefer hot drinks or cold drinks? | No |
Bias in an experiment refers to any factor that can influence the result of an experiment, such as participants knowing which group they belong to or the researcher favoring their focus on one of the groups. This bias can be done intentionally or unintentionally. Some of these factors will be described.
Selection bias refers to making some intervention when selecting the treatment and control groups. This bias can be avoided by randomly selecting the control and treatment groups to produce two similar groups.
The random selection can be performed through various methods ranging from flipping a coin or rolling a die to using a computer program to place members in each group.
It is caused when individuals know which group they belong to or if the researcher favors their focuses on one group more than the other. Doing that can cause the participants to change their responses or behaviors. From the point of view of the researcher, their expectations about the study results cause this bias. This factor can be avoided by using a blinding and a placebo.
Solution | Description |
---|---|
Placebo | A fake treatment that is given to the control group, but it is harmless. An example would be receiving a sugar pill rather than prescription medication. |
Blinding | This refers to preventing the groups or the researcher from knowing which one is the control and the treatment group. |
Davontay and Ali continue with their interest in eating habits. They went to the school nurse, who told him about intermittent fasting, an eating plan that switches between fasting and eating periods. With this plan, people eat during a specific time and fast for a certain number of hours each day.
The school nurse told them about his interest in knowing if this plan helps people, across the general population not only those in school, lose weight. Ali and Davontay discuss what would be a good plan to do such an experiment.
Treatment Group | Control Group | |
---|---|---|
A | People who will not follow the intermitent fasting. | People who will follow the intermittent fasting. |
B | People who will follow the intermittent fasting. | People who will not follow the intermitent fasting. |
Treatment Group | Control Group |
---|---|
People who will follow the intermittent fasting | People who will not follow the intermitent fasting |
Note this is given in option B.
This option selects the control and treatment groups by contacting people via social media.
Control Group | Treatment Group |
---|---|
The first 50 people registered via social media | The latter 50 people registered via social media |
Note that getting people from social media excludes people who do not use social media. When a difference between the control and treatment group makes them different populations, an unwanted bias might be introduced. This makes it difficult to make conclusions about the target population.
This procedure selects each group at random from a representative sample.
Control Group | Treatment Group |
---|---|
50 people selected at random from a representative sample | 50 people selected at random from a representative sample |
In general, random selection is a procedure without bias.
The last procedure selects the treatment and control group from people that ages between 15 and 25 years old.
Control Group | Treatment Group |
---|---|
A group of 50 people between 15 and 25 | A group of 50 people between 15 and 25 |
Although the two groups have some similar characteristics, this option has a bias because it does not represent the whole population. Consider that as people age, metabolism slows. If the participants are only young people, conclusions can not be made about the general population.
Statistical studies are applied in a wide range of disciplines such as marketing, science, medicine, technology, and investment. This makes it essential to create appropriate study designs when making data-driven decisions to avoid risks. Now, recall the discussion Ali and Davontay have about the influence of video games on students' GPA results.
They now have all the tools to design an experiment. Help them to find the following information to finish this task correctly.
Experiment | |
---|---|
Treatment Group | Control Group |
Students who will play video games during some period after school | Students who will not play any video games during the research |
With these conditions, the following sampling procedure can be used.
Select students from the student body at random and include proportionally students from every grade.
Keep in mind that sampling procedures may vary. However, it is always necessary to consider every factor that can cause bias.