Sign In
| 14 Theory slides |
| 9 Exercises - Grade E - A |
| Each lesson is meant to take 1-2 classroom sessions |
Here are some recommended readings, split into categories, that help prepare for this lesson.
Introduction to Population and Samples
Understanding Sampling Methods
Ali and Davontay, who are classmates and good friends, are talking about video games and their studies. Ali claims that video games can improve students' GPA performance in school. Davontay completely disagrees with that claim.
They wonder what they would need to perform an experiment at their school to find out whose claim is true. They should be able to design a successful experiment by finding the following information. Give them a hand!
Suppose a hamburger restaurant wants to measure customer satisfaction. Should they survey every single client? Or, instead, would it be better to survey a sample of the population? To help figure out the answer to these important questions, the definitions of parameter and statistic will come to the rescue!
Populations are analyzed with the purpose of identifying descriptive measures that will help to answer questions of interest, which may vary. The measures obtained from a population, like the population mean, the population proportion, and the population standard deviation, are called parameters.
A parameter is a value that describes some measurable characteristic of a population.
Collecting data of every member of a population is typically expensive and time-consuming. For that reason, samples of the population are used. The measures obtained from a sample, such as the sample mean, the sample proportion, and the sample standard deviation, are called statistics.
A statistic is a value that describes some measurable characteristic of a sample.
Since a statistic is a value that describes a sample, it can change from sample to sample. The following steps ensure that the sample statistics can then be used to make inferences about the population parameters.
The diagram illustrates this process.
They found some really cool studies but struggled to determine whether the descriptive numbers used in them were parameters or statistics. Help them identify which of the two each study represents.
In fact, most of the results are around 0.24. This means that the result of landing on red 14 times can easily occur by chance and it is most likely that the spinner does not favor red.
This means that this result is highly improbable to occur by chance when playing with a fair spinner. Therefore, with the spinner that Ali brought, to be able to land on red 27 times is so improbable that the spinner is most likely not fair at all, and it favors red!
Once a claim is identified and formulated as a statistical hypothesis, a statistical study can be conducted to analyze the hypothesis. It is important that the most appropriate data collection method for the study is determined.
A statistical study is a process in which data are analyzed to find answers to questions about a population parameter. The process includes determining the purpose of the study, selecting a representative sample of the population using sampling methods, calculating statistics, and making inferences about the population.
The reason for using samples is that it is almost impossible to collect data from every member of the population due to constraints such as time and money. The findings of a sample can be used to make inferences about the population as long as appropriate sampling procedures are used.
A poorly chosen sample may result in biased results and misinterpretations about the population. This can be avoided by using random sampling to ensure that data from the sample is typical of the population. Data collection can be done using methods such as survey, experiment, observational study. The best way to collect data can be determined based on the study's purpose.An experiment or controlled experiment is a data collection method used in statistical research to measure the effect of some treatment. It divides a sample into two groups that are kept under the same conditions. One group — the treatment group — receives some type of treatment, while the other — the control group — does not receive any treatment.
Experiments are used in statistical studies to compare the effect on the treatment group to the control group. For example, suppose a medical center wants to test whether a new vaccine helps defeat breast cancer in women. To conduct a controlled experiment, researchers need to sample women with breast cancer and randomly assign them to groups.
Control Group | Treatment Group |
---|---|
Women with breast cancer that will not receive the vaccine. | Women with breast cancer that will receive the vaccine. |
In an observational study, the researcher observes and finds measures about a chosen sample — doing so without controlling its environment, nor interacting with or exposing it to any treatment. This method is used when it is impossible to isolate the variable under study or when individuals cannot be exposed to treatments due to ethical concerns.
An observational study aims to observe the effect of a risk factor, a treatment, or other intervention. The study's criteria needs to be specified in advance to find characteristics, behaviors, or specific events in the observations. Criteria, such as a list of things to look for, ensure consistency in the process of making observations.
It is crucial to acknowledge that other variables might affect a study. Suppose that a diet program's effect on a person's blood pressure is being investigated. It must be accounted for that there are various factors which also affect blood pressure such as a person's stress levels or exercise routine. Consider the following cases of advantages of observational studies.
Advantages | Example |
---|---|
Measures are taken in real and naturally occurring scenarios. | Observing lions in their environment to learn about roles, eating habits, and social and family structures. |
It is helpful for complex situations, especially those where imposing treatment on individuals could be unethical. | Compare the sugar levels of 50 people who drink coffee and 50 who do not without influencing this decision. |
Observational studies also have disadvantages to be considered.
Disadvantages | Example |
---|---|
Risk of ignoring other factors that can affect the variables of the study. | A study might conclude that people who meditate have lower cholesterol levels than others, but it might ignore that they exercise more and follow a healthier diet than others. |
It can be time-consuming, mainly if the observed variables do not occur frequently or last for long periods. | Studying the lifestyle of elephants in their birth and pregnancy stages. Their pregnancy can last around two years and they give birth around every four years. |
A statistical study that collects data from participants through a questionnaire — a series of questions — is called a survey.The answers obtained in a survey aim to collect data about the characteristics, behaviors, or opinions of the members of the population.
Questions must be clear to avoid confusion in the participant. The participants answer the questions while the researcher is sure to never interfere. Here are a few common types of surveys.
Davontay and Ali enjoyed their time learning about study design so much that they immediately ran to the local science museum to keep learning. There, Davontay discovered a cool magazine called Interesting Facts You Have Never Thought About.
In an experiment, the sample is divided into two groups: a treatment and a control group. In an observational study, members of a sample are observed without influencing the members. In a survey, individuals are asked a series of questions.
The main characteristics of each situation will be determined one by one.
The table summarizes what was determined.
Situation | Method Used for Collecting Data |
---|---|
A | Experiment |
B | Observational Study |
C | Survey |
Ali and Davontay are on a hot streak classifying statistical studies. In the magazine from the science museum, it not only includes exciting past studies, but it has a future research section that poses real-life situations in need of study.
Future Research Problems | |
---|---|
A | The way people consume movies is changing rapidly. The movie industry wants to study what viewing options people would prefer to watch movies over the next decade. |
B | More households have begun gardening tomatoes than ever before. A company that makes garden soil wants to test whether their newly developed fertilizer improves the growth of tomatoes. |
C | Before building a new jungle gym, a local park wants to count how many toddlers are using the current jungle gym. The design of the jungle gym will depend on this finding. |
The given situations will be analyzed to determine which method of collecting data fits.
In situation B, a company wants to test the impact a fertilizer has on the growth of tomatoes. To observe the fertilzer's effects, a group of sample tomato plants must be exposed to the fertilizer. That means a treatment group and a control group will be needed.
Treatment Group | Control Group |
---|---|
A group of plants that will be exposed to the fertilizer | A group of plants that will not be exposed to the fertilizer |
The table summarizes the information.
Situation | Appropriate Method for Collecting Data |
---|---|
A | Survey |
B | Experiment |
C | Observational Study |
In a survey, data is collected by participants answering a series of questions. Unfortunately, poorly designed surveys can negatively affect the results of a study. For example, leading questions influence participants' answers so that their original views are not reflected. This kind of influence is called bias in survey questions. A variety of factors contribute to bias.
A double-barreled question causes the reader to answer two questions with the same single answer, thereby disallowing for the necessary nuance to answer both questions separately. To avoid this issue, verify the question and, if needed, split it into two or more questions.
Biased Question | Unbiased Question |
---|---|
How likely are you to recommend medieval fantasy books and movies ? | How likely are you to recommend medieval fantasy books? |
How likely are you to recommend medieval fantasy movies? | |
How much do you enjoy playing video games and watching movies? | How much do you enjoy playing video games? |
How much do you enjoy watching movies? |
This bias happens when questions that are related are not presented in logical order. For example, questions asked early in a survey can influence how participants answer questions later in the survey.
Biased Order |
---|
1. What is your least favorite math subject? |
2. How much do you enjoy math? |
In this case, asking about the participant's least favorite math subject will have a strong influence on how they respond if later asked about their feelings towards math in general. Using the inverse order will produce a fairer response.
Unbiased Order |
---|
1. How much do you enjoy math? |
2. What is your least favorite math subject? |
Depending on the situation, randomization can also be applied to avoid this bias.
A leading question leads respondents to answer in a specific way desired by the surveyor. Leading questions can be written either intentionally or unintentionally. To avoid this bias, ask questions clearly and do not use words that encourage specific answers.
Biased Question | Unbiased Question |
---|---|
Do you always consume fast food? | How often do you consume fast food? |
How healthy are you? | How would you rate your health? |
All of this studying is making Ali and Davontay hungry. While eating, they observe students' behaviors; some eat the cafeteria's food, and others bring food from home. Feeling inspired by their new research skills, they want to study this behavior more in-depth.
They plan to conduct a survey and design a series of questions to ask the students. Identify which of the following survey questions are biased?Identify if there is any word that can lead to a specific answer. Check if the questions ask only one thing at a time.
Each question will be analyzed to determine if they are biased or not.
All options have been analyzed. The table summarizes the results.
Option | Biased? |
---|---|
How good are the cafeteria facilities? | Yes |
Do you bring your own lunch or buy it at the school's cafeteria? | No |
How do you consider the taste and appearance of the cafeteria's food? | Yes |
Do you prefer hot drinks or cold drinks? | No |
Bias in an experiment refers to any factor that can influence the result of an experiment, such as participants knowing which group they belong to or the researcher favoring their focus on one of the groups. This bias can be done intentionally or unintentionally. Some of these factors will be described.
Selection bias refers to making some intervention when selecting the treatment and control groups. This bias can be avoided by randomly selecting the control and treatment groups to produce two similar groups.
The random selection can be performed through various methods ranging from flipping a coin or rolling a die to using a computer program to place members in each group.
It is caused when individuals know which group they belong to or if the researcher favors their focuses on one group more than the other. Doing that can cause the participants to change their responses or behaviors. From the point of view of the researcher, their expectations about the study results cause this bias. This factor can be avoided by using a blinding and a placebo.
Solution | Description |
---|---|
Placebo | A fake treatment that is given to the control group, but it is harmless. An example would be receiving a sugar pill rather than prescription medication. |
Blinding | This refers to preventing the groups or the researcher from knowing which one is the control and the treatment group. |
Davontay and Ali continue with their interest in eating habits. They went to the school nurse, who told him about intermittent fasting, an eating plan that switches between fasting and eating periods. With this plan, people eat during a specific time and fast for a certain number of hours each day.
The school nurse told them about his interest in knowing if this plan helps people, across the general population not only those in school, lose weight. Ali and Davontay discuss what would be a good plan to do such an experiment.
Treatment Group | Control Group | |
---|---|---|
A | People who will not follow the intermitent fasting. | People who will follow the intermittent fasting. |
B | People who will follow the intermittent fasting. | People who will not follow the intermitent fasting. |
Treatment Group | Control Group |
---|---|
People who will follow the intermittent fasting | People who will not follow the intermitent fasting |
Note this is given in option B.
This option selects the control and treatment groups by contacting people via social media.
Control Group | Treatment Group |
---|---|
The first 50 people registered via social media | The latter 50 people registered via social media |
Note that getting people from social media excludes people who do not use social media. When a difference between the control and treatment group makes them different populations, an unwanted bias might be introduced. This makes it difficult to make conclusions about the target population.
This procedure selects each group at random from a representative sample.
Control Group | Treatment Group |
---|---|
50 people selected at random from a representative sample | 50 people selected at random from a representative sample |
In general, random selection is a procedure without bias.
The last procedure selects the treatment and control group from people that ages between 15 and 25 years old.
Control Group | Treatment Group |
---|---|
A group of 50 people between 15 and 25 | A group of 50 people between 15 and 25 |
Although the two groups have some similar characteristics, this option has a bias because it does not represent the whole population. Consider that as people age, metabolism slows. If the participants are only young people, conclusions can not be made about the general population.
Statistical studies are applied in a wide range of disciplines such as marketing, science, medicine, technology, and investment. This makes it essential to create appropriate study designs when making data-driven decisions to avoid risks. Now, recall the discussion Ali and Davontay have about the influence of video games on students' GPA results.
They now have all the tools to design an experiment. Help them to find the following information to finish this task correctly.
Experiment | |
---|---|
Treatment Group | Control Group |
Students who will play video games during some period after school | Students who will not play any video games during the research |
With these conditions, the following sampling procedure can be used.
Select students from the student body at random and include proportionally students from every grade.
Keep in mind that sampling procedures may vary. However, it is always necessary to consider every factor that can cause bias.
We need to determine if Tadeo or Ramsha correctly stated the population and the sample used in the given study. To discover who is correct, we will use a Venn diagram to represent the target population of this situation. Because the survey was taken only among students of EJH, the population consists of all the students of EJH.
Next, since 150 were surveyed, we can assume that not everyone in the school took part in the survey since the given information says that EJH has a few hundred students. We can add this information to our diagram.
Observing the diagram, note that the surveyed students is a subset of the student body. This means that the surveyed students are a sample of the students at EJH. Moreover, it is given that 100 out of the 150 students surveyed read at least 5 books per year. Let's add this information to our Venn diagram.
From the diagram, we can conclude the following.
This corresponds to what Ramsha stated. Therefore, Ramsha is correct.
Read each study and identify whether it is based on a parameter or a statistic.
We are asked to find if the given study is based on a parameter or a statistic. Recall that measures obtained from a population are called parameters, and measures obtained from a sample are statistics. In this case, the mean weight was obtained from all students of fourth grade, which represents the population. \begin{gathered} \underline\textbf{Population}\\ \text{The students of fifth grade} \end{gathered} Therefore, because the population was considered, the mean weight is a parameter.
In a similar fashion, we can determine whether the standard deviation of ages of residents in Alabama is a parameter or a statistic. In this situation, the population is given by all residents of Alabama.
Population
Residents in Alabama
Note that although it is not directly stated, it is almost impossible to survey every single resident of Alabama. This indicates that the measure was obtained from a sample, which makes it a statistic.
Finally, let's analyze the study about the median annual income of all employees at a company. In this study, the population or group of interest is all employees at the company. \begin{gathered} \underline\textbf{Population}\\ \text{Employees of the Company} \end{gathered} Therefore, because the median income was calculated from all the employees in the company, it is a parameter.
Classify each study according to the data collection method that fits best.
We need to identify which data collection method will fit the given study. We can note that the study's purpose is to compare the temperature of the computers with the new chip against the temperature of the computers without the new chip.
Study's Purpose | ||
---|---|---|
Determine the temperature performance of the CPU with the new chip. |
Computers with the new chip can be considered as a treatment group where the treatment is having the new chip implemented. At the same time, computers with the old chips represent the control group, because they did not receive the treatment — the new chip.
Treatment Group | Control Group |
---|---|
The group of computers that will get the new chip. | The group of computers that will not get the new chip. |
This means that in this case, the study will be performed by conducting an experiment.
For this situation, we want to know readers' preferences between electronic and printed books.
Study's Purpose | ||
---|---|---|
Investigate readers' preferences between electronic and printed books. |
We can address a group of readers by asking them a series of questions to know their preferences. This means that this study can be done by using a survey.
In this study, we want to determine customers behaviors regarding the use of the ticket counter and the automatic ticket machine.
Study's Purpose | ||
---|---|---|
Analyze customers' preferences regarding the ticket counter and the automatic ticket machine. |
Although we may think that two groups are needed because the automatic machine can be thought of as a treatment, it is not easy to divide customers into a treatment and control group. Conversely, we can observe which option of buying tickets people entering the movie facilities will choose. Therefore, this is an observational study.
For each study, determine which option appropriately name the sample and the population.
A factory overseer chooses 180 televisions at random from those produced last week. Then he tests if the televisions work.
Population | Sample | |
---|---|---|
A | The population is all televisions produced at the factory. | The sample is the televisions produced last week. |
B | The population is the televisions produced at the factory last week. | The sample is the 180 televisions selected. |
C | The population is all televisions in the world. | The sample is all televisions produced at the factory. |
The owners of an ice cream parlor located in a mall wanted to predict what additional ice cream flavor options would sell well. They selected 100 mall visitors at random and surveyed them to get their opinions.
Population | Sample | |
---|---|---|
A | The population is all people who buy ice cream in the ice cream parlor. | The sample is the 100 visitors selected. |
B | The population is the 100 visitors selected. | The sample is the ice cream flavors. |
C | The population is all people who visit the mall. | The sample is the 100 visitors selected. |
We are asked to identify which option describes the population and sample of the study correctly. Let's begin by identifying the study's purpose to determine which option is correct. In this case, the factory overseer wants to know if the produced televisions last week work properly.
Study's Purpose |
---|
Determine if the televisions produced last week are working well. |
This indicates that the population of the study is given by all televisions produced at the factory last week. Additionally, the sample will be the selected televisions from those produced last week.
Population | Sample |
---|---|
The population is the televisions produced at the factory last week. | The sample is the 180 televisions selected. |
Therefore, option B correctly defines the sample and the population of the study.
We will follow a similar process to determine which option defines the sample and the population of the study about the ice cream parlor. In this case, the owners want to predict what additional ice cream flavor options would sell well.
Study's Purpose |
---|
Determine ice cream flavors that would sell well. |
Now, because the ice cream parlor is in the mall, the population must be all the visitors of the mall that are potential clients of the ice cream parlor. Moreover, the sample is the 100 visitors selected.
Population | Sample |
---|---|
The population is all people who visit the mall. | The sample is the 100 visitors selected. |
This is given in option C. Be aware that someone thinks that population could be the customers of the ice cream parlor. However, limiting the population to this group will ignore potential clients.
Investigate if the data was collected from a population or a sample of the population.
We want to determine whether the population or a sample of the population was used when collecting the data. Recall that the population consists of all the members of a group of interest. If it is impractical to collect data from every member of the population, a representative sample, a subset of the population, is used instead.
With this information, let's consider the given information.
The height of every player of Spain's La Liga — the top professional soccer division of Spain.
In this case, the population of interest is Spain's top soccer division. Because all players were considered, the data was obtained from the population.
Consider the given information.
A survey asked 100 people about the time spent watching TV on weekends.
Note that in this situation, the population of interest is the residents of the state of Virginia, and because only a few of them were surveyed, the data was collected from a sample.
Let's now take a look at the third study.
The height of every student at a North Junior High.
Here, the population of interest is the student body of NJH. Since every student was considered, the whole population was used.