{{ 'ml-label-loading-course' | message }}
{{ toc.name }}
{{ toc.signature }}
{{ tocHeader }} {{ 'ml-btn-view-details' | message }}
{{ tocSubheader }}
{{ 'ml-toc-proceed-mlc' | message }}
{{ 'ml-toc-proceed-tbs' | message }}
Lesson
Exercises
Recommended
Tests
An error ocurred, try again later!
Chapter {{ article.chapter.number }}
{{ article.number }}. 

{{ article.displayTitle }}

{{ article.intro.summary }}
Show less Show more expand_more
{{ ability.description }} {{ ability.displayTitle }}
Lesson Settings & Tools
{{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }}
{{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }}
{{ 'ml-lesson-time-estimation' | message }}
People are presented with claims about products, lifestyles, and more daily. There must be a way to check the validity of those claims; otherwise, problems could occur. Well, conducting a statistical study would surely help. This lesson explores and defines, through real-world examples, the methods needed to make great study.

Catch-Up and Review

Here are some recommended readings, split into categories, that help prepare for this lesson.

Introduction to Population and Samples

Understanding Sampling Methods

Challenge

Can Video Games Improve Student Performance in School?

Ali and Davontay, who are classmates and good friends, are talking about video games and their studies. Ali claims that video games can improve students' GPA performance in school. Davontay completely disagrees with that claim.

Ali and Davontay talking about videogames and studies

They wonder what they would need to perform an experiment at their school to find out whose claim is true. They should be able to design a successful experiment by finding the following information. Give them a hand!

a Determine the objective and identify the population of the experiment.
b State the treatment group and the control group.
c Describe a sample procedure.
Discussion

Measures of a Population vs. Measures of a Sample

Suppose a hamburger restaurant wants to measure customer satisfaction. Should they survey every single client? Or, instead, would it be better to survey a sample of the population? To help figure out the answer to these important questions, the definitions of parameter and statistic will come to the rescue!

Concept

Parameter and Statistic

Populations are analyzed with the purpose of identifying descriptive measures that will help to answer questions of interest, which may vary. The measures obtained from a population, like the population mean, the population proportion, and the population standard deviation, are called parameters.

A parameter is a value that describes some measurable characteristic of a population.

Collecting data of every member of a population is typically expensive and time-consuming. For that reason, samples of the population are used. The measures obtained from a sample, such as the sample mean, the sample proportion, and the sample standard deviation, are called statistics.

A statistic is a value that describes some measurable characteristic of a sample.

Since a statistic is a value that describes a sample, it can change from sample to sample. The following steps ensure that the sample statistics can then be used to make inferences about the population parameters.

  1. Use a sampling method to get a representative sample.
  2. Use the sample to calculate the statistics.
  3. Estimate the population parameters by using the statistics.
  4. Make inferences about the population using the estimated parameters.

The diagram illustrates this process.

Steps to make inferences about the parameters of a population using statistics.
Example

Identify if a Study Is Based on a Parameter or a Statistic

Ali and Davontay have caught the research bug. They really want to understand the concepts of statistics and parameters so they can understand various studies whenever they come across them. Eagerly, they ran to the library to look up some studies.
Ali and Davontay at the library
External credits: @katemangostar

They found some really cool studies but struggled to determine whether the descriptive numbers used in them were parameters or statistics. Help them identify which of the two each study represents.

a Study One: The mean score of all the players in a Tetris tournament at East JHS was
b Study Two: out of the students at West JHS were surveyed. It was found that the standard deviation of their heights was inches.
c Study Three: In a science project at North JHS, students measured the height of a proportion of randomly sampled algae plants. The plants studied were found to have a mean height of centimeters.

Hint

a Measures obtained from a population are called parameters and measures obtained from a sample are called statistics.
b What is the population of the survey?
c Identify if the measure was obtained from the whole population or a sample.

Solution

a Recall that measures obtained from a population are called parameters, and measures obtained from a sample are statistics. In this case, it is given that the study obtained the mean score using every tournament player, not just a few players.
This means that the whole population was considered when calculating the mean score. Therefore, the given measure is a parameter.
b It is given that out of the students were surveyed. This means that the students surveyed represent a sample of the whole population of students.
Because the standard deviation was obtained from a sample, it is a statistic.
c This study says that students measured only a proportion of algae plants that were selected at random. This means that measures were taken from a sample.
Therefore, since the mean height of centimeters was obtained from a sample, it is a statistic.
Discussion

Statistical Hypothesis

In analyzing a population, suspicions and expectations lead to making some assumptions or claims about population parameters. In statistics, a claim about a parameter of a population is called a hypothesis. The following are common examples of hypotheses involving a population parameter.
Examples of hypotheses involving a population parameter
A statistical study can test a hypothesis to determine whether it is true or false. This test can be performed by making a hypothesis test or a simulation. When evaluating a hypothesis, it needs to be distinguished between results that can easily occur by chance and results that are highly improbable to occur by chance.
Example

Using Simulations to Test a Statistical Hypothesis

Ali and Davontay are taking a break outside of the library. Ali brought something he would like to try with Davontay — a spinner with four equal sections colored red, blue, green, and yellow.
A spinner with equal sections colored red, blue, green, and yellow
They spun the spinner five times, and to their surprise, the outcome was red on each try. However, the probability of getting red times in a row is which is quite unlikely.
Therefore, Ali and Davontay suspect that the spinner favors red. Yet, the spinner's box says that it does not favor any color. The friends decided to simulate the spin of the spinner times by repeatedly drawing random samples of size The following histogram shows the results obtained.
Histogram of the frequency table of the example simulation
Keep in mind that different simulations may produce different but similar results. Now, suppose the friends actually spin the real spinner times. What could they infer about the spinner given each of the following outcomes?
a The spinner lands on red times.
b The spinner lands on red times.

Answer

a Example Answer: The spinner most likely does not favor red.
b Example Answer: It is most likely that the spinner favors red.

Hint

a Begin by identifying the statistical hypothesis. Look for the proportion that corresponds to this situation in the histogram.
b Does this result occur in the simulation?

Solution

a Ali and Davontay suspect that the spinner favors red because, in the first five spins they made, the spinner landed on red on each try. However, the spinner's box claims that the spinner does not favor any color. This claim represents the statistical hypothesis that what the friends suspect is not true.
The hypothesis is the same as saying that the proportion, in the long run, of landing on each color is It can then be assumed that the probability of landing on red is and the probability of landing on a different color is
When the two boys actually spun the spinner, not a simulation, it landed on red out of spins. This corresponds to a proportion of Referring to the given histogram and checking where is, this outcome has a relative frequency of
A histogram showing the relative frequency of red outcomes from a simulation where a four-colored spinner was spun 50 times. The vertical axis shows the relative frequency, and the horizontal axis shows the proportion of spins that landed on red. Values on the horizontal axis range from 0.04 to 0.58 at increments of 0.04. Values on the vertical axis range from 0 to 0.20 at increments of 0.04

In fact, most of the results are around This means that the result of landing on red times can easily occur by chance and it is most likely that the spinner does not favor red.

b What might the friends conclude if, instead, the spinner landed on red out spins? Not that these outcomes corresponds to a proportion of As done in Part A, look at the histogram of the simulation to find the relative frequency of this proportion. It is
Proportion of 0.54

This means that this result is highly improbable to occur by chance when playing with a fair spinner. Therefore, with the spinner that Ali brought, to be able to land on red times is so improbable that the spinner is most likely not fair at all, and it favors red!

Ali and Davontay realizing they have an unfair spinner


Discussion

Statistical Studies

Once a claim is identified and formulated as a statistical hypothesis, a statistical study can be conducted to analyze the hypothesis. It is important that the most appropriate data collection method for the study is determined.

Concept

Statistical Study

A statistical study is a process in which data are analyzed to find answers to questions about a population parameter. The process includes determining the purpose of the study, selecting a representative sample of the population using sampling methods, calculating statistics, and making inferences about the population.

The process of a statistical study

The reason for using samples is that it is almost impossible to collect data from every member of the population due to constraints such as time and money. The findings of a sample can be used to make inferences about the population as long as appropriate sampling procedures are used.

A sample obtained from a population from which some inferences are made.
A poorly chosen sample may result in biased results and misinterpretations about the population. This can be avoided by using random sampling to ensure that data from the sample is typical of the population. Data collection can be done using methods such as survey, experiment, observational study. The best way to collect data can be determined based on the study's purpose.
Concept

Experiment

An experiment or controlled experiment is a data collection method used in statistical research to measure the effect of some treatment. It divides a sample into two groups that are kept under the same conditions. One group — the treatment group — receives some type of treatment, while the other — the control group — does not receive any treatment.

Represents dividing a sample into a control group (right upper corner) and a treatment group (right lower corner), with the sample shown on the left.

Experiments are used in statistical studies to compare the effect on the treatment group to the control group. For example, suppose a medical center wants to test whether a new vaccine helps defeat breast cancer in women. To conduct a controlled experiment, researchers need to sample women with breast cancer and randomly assign them to groups.

Control Group Treatment Group
Women with breast cancer that will not receive the vaccine. Women with breast cancer that will receive the vaccine.
In a controlled experiment, the treatment can be any action that can affect a variable under study. For example, vaccinations given to people, fertilizers applied to crops, and education through online videos are all treatments. Lastly, it is crucial to understand that controlled experiments have both advantages and disadvantages.
Advantages and disadvantages of Experiments
Always consider the advantages and disadvantages carefully when performing an experiment, taking into account the context and situation being studied.
Concept

Observational Study

In an observational study, the researcher observes and finds measures about a chosen sample — doing so without controlling its environment, nor interacting with or exposing it to any treatment. This method is used when it is impossible to isolate the variable under study or when individuals cannot be exposed to treatments due to ethical concerns.

The study group

An observational study aims to observe the effect of a risk factor, a treatment, or other intervention. The study's criteria needs to be specified in advance to find characteristics, behaviors, or specific events in the observations. Criteria, such as a list of things to look for, ensure consistency in the process of making observations.

Illustration about defining criteria in a study.

It is crucial to acknowledge that other variables might affect a study. Suppose that a diet program's effect on a person's blood pressure is being investigated. It must be accounted for that there are various factors which also affect blood pressure such as a person's stress levels or exercise routine. Consider the following cases of advantages of observational studies.

Advantages Example
Measures are taken in real and naturally occurring scenarios. Observing lions in their environment to learn about roles, eating habits, and social and family structures.
It is helpful for complex situations, especially those where imposing treatment on individuals could be unethical. Compare the sugar levels of people who drink coffee and who do not without influencing this decision.

Observational studies also have disadvantages to be considered.

Disadvantages Example
Risk of ignoring other factors that can affect the variables of the study. A study might conclude that people who meditate have lower cholesterol levels than others, but it might ignore that they exercise more and follow a healthier diet than others.
It can be time-consuming, mainly if the observed variables do not occur frequently or last for long periods. Studying the lifestyle of elephants in their birth and pregnancy stages. Their pregnancy can last around two years and they give birth around every four years.
In conclusion, consider the context of what is under study to be sure that an observational study is the best methodology for data collection.
Concept

Survey

A statistical study that collects data from participants through a questionnaire — a series of questions — is called a survey.The answers obtained in a survey aim to collect data about the characteristics, behaviors, or opinions of the members of the population.

Survey.jpg

Questions must be clear to avoid confusion in the participant. The participants answer the questions while the researcher is sure to never interfere. Here are a few common types of surveys.

  • Online: This is one of the most cost effective ways to conduct a survey. Individuals are able to respond quickly as they can do them via a website or through email.
  • Phone Call: This method requires interviewing participants through phone calls. It could require more time and money than online surveys because of service costs and number of calls made.
  • Face-to-face: The researcher conducts the survey in-person with the participant. This choice could be expensive and time-consuming.
When done correctly, the method chosen does not have a significant influence on the results. In any event, it is critical that the questions are clear, mitigate any ambiguity, and serve the purpose of the study not the researcher's expectation.
Example

Which Method of Collecting Data Was Used?

Davontay and Ali enjoyed their time learning about study design so much that they immediately ran to the local science museum to keep learning. There, Davontay discovered a cool magazine called Interesting Facts You Have Never Thought About.

Magazine Cover
The magazine introduces three main types of data collection: experiments, surveys, and observational studies. The two friends then go on to read a few statistical studies. They are having a difficult time identifying which method was applied. Help them match the method with the study.

Hint

In an experiment, the sample is divided into two groups: a treatment and a control group. In an observational study, members of a sample are observed without influencing the members. In a survey, individuals are asked a series of questions.

Solution

The main characteristics of each situation will be determined one by one.

Option A

A treatment using electronic devices is applied to a group of people. The group that used electronic devices for six weeks represents the treatment group. The other group represents the control group. Given this information, it is an example of an experiment.

Option B

The coffee shop collects data without controlling the costumers. The costumers are not affected by the study. Therefore, this is an observational study.

Option C

The data is collected by asking randomly selected customers about food flavor. This means that option C is a survey.

Putting Them All Together

The table summarizes what was determined.

Situation Method Used for Collecting Data
A Experiment
B Observational Study
C Survey
Example

Identifying Best Fit: Method of Data Collection for a Statistical Study

Ali and Davontay are on a hot streak classifying statistical studies. In the magazine from the science museum, it not only includes exciting past studies, but it has a future research section that poses real-life situations in need of study.

Future Research Problems
A The way people consume movies is changing rapidly. The movie industry wants to study what viewing options people would prefer to watch movies over the next decade.
B More households have begun gardening tomatoes than ever before. A company that makes garden soil wants to test whether their newly developed fertilizer improves the growth of tomatoes.
C Before building a new jungle gym, a local park wants to count how many toddlers are using the current jungle gym. The design of the jungle gym will depend on this finding.
The two friends want to identify which data collection method will best fit each of these studies. Help them classify these situations.

Hint

Think about if the participant or variable will be affected by the method of study or not. How does the researcher interact with the sample?

Solution

The given situations will be analyzed to determine which method of collecting data fits.

Situation A

In this future research case, the movie industry wants to investigate the options viewers have regarding how they would prefer to watch movies over the next decade. This means that data can be collected by asking people directly or by giving them a questionnaire. In such a case, a survey would be an effective data collection method.

Situation B

In situation B, a company wants to test the impact a fertilizer has on the growth of tomatoes. To observe the fertilzer's effects, a group of sample tomato plants must be exposed to the fertilizer. That means a treatment group and a control group will be needed.

Treatment Group Control Group
A group of plants that will be exposed to the fertilizer A group of plants that will not be exposed to the fertilizer
Therefore, the data can be obtained by an experiment.

Situation C

The park staff wants to count the number of toddlers using the current jungle gym before building a new one. Here, the toddlers decide whether to use the jungle gym. This is a behavior seen in its naturally occurring state. Therefore, this situation can be modeled with an observational study.

Put Them All Together

The table summarizes the information.

Situation Appropriate Method for Collecting Data
A Survey
B Experiment
C Observational Study
Discussion

Bias in Survey Questions

In a survey, data is collected by participants answering a series of questions. Unfortunately, poorly designed surveys can negatively affect the results of a study. For example, leading questions influence participants' answers so that their original views are not reflected. This kind of influence is called bias in survey questions. A variety of factors contribute to bias.

Double-Barreled Questions

A double-barreled question causes the reader to answer two questions with the same single answer, thereby disallowing for the necessary nuance to answer both questions separately. To avoid this issue, verify the question and, if needed, split it into two or more questions.

Biased Question Unbiased Question
How likely are you to recommend medieval fantasy books and movies ? How likely are you to recommend medieval fantasy books?
How likely are you to recommend medieval fantasy movies?
How much do you enjoy playing video games and watching movies? How much do you enjoy playing video games?
How much do you enjoy watching movies?

Question Order

This bias happens when questions that are related are not presented in logical order. For example, questions asked early in a survey can influence how participants answer questions later in the survey.

Biased Order
What is your least favorite math subject?
How much do you enjoy math?

In this case, asking about the participant's least favorite math subject will have a strong influence on how they respond if later asked about their feelings towards math in general. Using the inverse order will produce a fairer response.

Unbiased Order
How much do you enjoy math?
What is your least favorite math subject?

Depending on the situation, randomization can also be applied to avoid this bias.

Leading Questions

A leading question leads respondents to answer in a specific way desired by the surveyor. Leading questions can be written either intentionally or unintentionally. To avoid this bias, ask questions clearly and do not use words that encourage specific answers.

Biased Question Unbiased Question
Do you always consume fast food? How often do you consume fast food?
How healthy are you? How would you rate your health?
Keep in mind that these are just a few of the factors that can lead to bias in survey questions. Other factors such as language, length of the survey, and asking answerable questions should also be considered to avoid biases in questions.
Example

Identifying Bias in Survey Questions

All of this studying is making Ali and Davontay hungry. While eating, they observe students' behaviors; some eat the cafeteria's food, and others bring food from home. Feeling inspired by their new research skills, they want to study this behavior more in-depth.

Cafeteria-survey-demo.jpg

They plan to conduct a survey and design a series of questions to ask the students. Identify which of the following survey questions are biased?

Hint

Identify if there is any word that can lead to a specific answer. Check if the questions ask only one thing at a time.

Solution

Each question will be analyzed to determine if they are biased or not.

Cafeteria Facilities

Consider the question about the cafeteria facilities.
When making a question in a survey, researchers must be neutral and not try to influence respondents' answers. This question is a leading question because it contains the word which can cause the respondent to answer positively. This means it is a biased question.

Bringing or Buying Lunch

Read the question asking whether students bring lunch or buy it.
This question is consistent. It does not need more information to answer it because it only chooses between two clear options. This implies that this question is not biased.

Taste and Appearance of Cafeteria's Food

Next, the question about the taste and appearance of the cafeteria's food will be analyzed.
This question is double-barreled since it asks for two things, the taste and appearance of the food. Because asking two things at a time is a factor that causes bias in survey questions, this is a biased question.

Hot Drinks vs Cold Drinks

Now, examine the question about the drinks.
This question is similar to second question. It is a clear question and only asks whether students prefer one option over another. This means that it is not a biased question.

Put Them All Together

All options have been analyzed. The table summarizes the results.

Option Biased?
How good are the cafeteria facilities? Yes
Do you bring your own lunch or buy it at the school's cafeteria? No
How do you consider the taste and appearance of the cafeteria's food? Yes
Do you prefer hot drinks or cold drinks? No
Discussion

Bias in an Experiment

Bias in an experiment refers to any factor that can influence the result of an experiment, such as participants knowing which group they belong to or the researcher favoring their focus on one of the groups. This bias can be done intentionally or unintentionally. Some of these factors will be described.

Group Selection Bias

Selection bias refers to making some intervention when selecting the treatment and control groups. This bias can be avoided by randomly selecting the control and treatment groups to produce two similar groups.

Visualization of random selection with a sample on the left, a die in the middle representing the random selection, a control group in the top-right corner, and a treatment group in the bottom-left corner.

The random selection can be performed through various methods ranging from flipping a coin or rolling a die to using a computer program to place members in each group.

Performance Bias

It is caused when individuals know which group they belong to or if the researcher favors their focuses on one group more than the other. Doing that can cause the participants to change their responses or behaviors. From the point of view of the researcher, their expectations about the study results cause this bias. This factor can be avoided by using a blinding and a placebo.

Solution Description
Placebo A fake treatment that is given to the control group, but it is harmless. An example would be receiving a sugar pill rather than prescription medication.
Blinding This refers to preventing the groups or the researcher from knowing which one is the control and the treatment group.
To make valid conclusions from an experiment, researchers must avoid any factor that can cause bias. Besides the given cases, many other factors that can cause bias, such as making errors in data collection.
Example

Identifying Bias in Experimental Studies

Davontay and Ali continue with their interest in eating habits. They went to the school nurse, who told him about intermittent fasting, an eating plan that switches between fasting and eating periods. With this plan, people eat during a specific time and fast for a certain number of hours each day.

Nutriotionist

The school nurse told them about his interest in knowing if this plan helps people, across the general population not only those in school, lose weight. Ali and Davontay discuss what would be a good plan to do such an experiment.

a Ali says that the objective of the study is to determine if intermittent fasting helps lose weight. On the contrary, Davontay claims that the purpose is to determine if intermittent fasting is a healthy plan. Whose claim is correct?
b The following options for the control and treatment groups are proposed.
Treatment Group Control Group
A People who will not follow the intermitent fasting. People who will follow the intermittent fasting.
B People who will follow the intermittent fasting. People who will not follow the intermitent fasting.
Which option represent the two groups correctly?
c Which of these procedures can cause bias in the experiment?

Hint

a The school nurse is not interested in whether intermittent fasting is healthy.
b Recall that the treatment group is the one that will receive some treatment or any action that might affect them. The control group is the one that does not receive any treatment.
c Group selection should be done randomly to avoid bias.

Solution

a The school nurse wants to know if intermittent fasting helps people lose weight. He is not interested in whether this diet is healthy. This establishes the study's objective.
This means that Ali correctly identified the objective of the study.
b Consider that the treatment group is the one that receives some treatment. In this case, the treatment is the intermittent fasting. This means that the control group will be the one who will not follow the intermittent fasting.
Treatment Group Control Group
People who will follow the intermittent fasting People who will not follow the intermitent fasting

Note this is given in option B.

c Analyze each of the sample procedures one at a time.

Social Media Procedure

This option selects the control and treatment groups by contacting people via social media.

Control Group Treatment Group
The first people registered via social media The latter people registered via social media

Note that getting people from social media excludes people who do not use social media. When a difference between the control and treatment group makes them different populations, an unwanted bias might be introduced. This makes it difficult to make conclusions about the target population.

Random Selection Procedure

This procedure selects each group at random from a representative sample.

Control Group Treatment Group
people selected at random from a representative sample people selected at random from a representative sample

In general, random selection is a procedure without bias.

Selecting People From to Years Old

The last procedure selects the treatment and control group from people that ages between and years old.

Control Group Treatment Group
A group of people between and A group of people between and

Although the two groups have some similar characteristics, this option has a bias because it does not represent the whole population. Consider that as people age, metabolism slows. If the participants are only young people, conclusions can not be made about the general population.

Closure

Designing an Experiment

Statistical studies are applied in a wide range of disciplines such as marketing, science, medicine, technology, and investment. This makes it essential to create appropriate study designs when making data-driven decisions to avoid risks. Now, recall the discussion Ali and Davontay have about the influence of video games on students' GPA results.

Ali and Davontay talking about videogames and studies

They now have all the tools to design an experiment. Help them to find the following information to finish this task correctly.

a Determine the objective and identify the population of the experiment.
b State the treatment group and the control group.
c Describe a sample procedure.

Answer

a Objective: Determine if video games improve students' GPA results.
Population: Student body at their school
b Treatment Group: Students who will play video games during some period after school
Control Group: Students who will not play any video games during the research
c Example Answer: Select students from the student body randomly and proportionally include students from every grade.

Hint

a What do Ali and Davontay want to observe?
b The treatment group receives some treatment or any action that might affect them. The control group does not receive any treatment.
c To avoid bias in an experiment, group selections need to be taken at random.

Solution

a To identify the study's objective, it is good to ask what Ali and Davontay want to determine. Note that they ask if video games improve students' GPA results. This is the objective of the study.
Next, note that the students only want to test this influence at their school. This means that the population will be the student body.
b Since the objective is to determine if video games improve students' GPA results and the data collection method is experiment, two groups are needed. These groups can be defined as follows.
Experiment
Treatment Group Control Group
Students who will play video games during some period after school Students who will not play any video games during the research
c Since an experiment will be conducted, the following situations need to be considered to get a representative sample of the student body and to avoid bias.
  • Sampling needs to be done by selecting participants at random.
  • The chosen sample must represent every subgroup of the population.
  • The sample is then divided randomly to generate the treatment and control groups with similar characteristics.

With these conditions, the following sampling procedure can be used.

Select students from the student body at random and include proportionally students from every grade.

Keep in mind that sampling procedures may vary. However, it is always necessary to consider every factor that can cause bias.

Loading content