Explore Data Collection Methods in Statistical Study: Sample vs Population and Survey Design

Control Group	Treatment Group
Women with breast cancer that will not receive the vaccine.	Women with breast cancer that will receive the vaccine.

Control Group

Treatment Group

Women with breast cancer that will not receive the vaccine.

Women with breast cancer that will receive the vaccine.

Advantages	Example
Measures are taken in real and naturally occurring scenarios.	Observing lions in their environment to learn about roles, eating habits, and social and family structures.
It is helpful for complex situations, especially those where imposing treatment on individuals could be unethical.	Compare the sugar levels of $50$ people who drink coffee and $50$ who do not without influencing this decision.

Advantages

Example

Measures are taken in real and naturally occurring scenarios.

Observing lions in their environment to learn about roles, eating habits, and social and family structures.

It is helpful for complex situations, especially those where imposing treatment on individuals could be unethical.

Compare the sugar levels of

50

people who drink coffee and

50

who do not without influencing this decision.

Disadvantages	Example
Risk of ignoring other factors that can affect the variables of the study.	A study might conclude that people who meditate have lower cholesterol levels than others, but it might ignore that they exercise more and follow a healthier diet than others.
It can be time-consuming, mainly if the observed variables do not occur frequently or last for long periods.	Studying the lifestyle of elephants in their birth and pregnancy stages. Their pregnancy can last around two years and they give birth around every four years.

Disadvantages

Example

Risk of ignoring other factors that can affect the variables of the study.

A study might conclude that people who meditate have lower cholesterol levels than others, but it might ignore that they exercise more and follow a healthier diet than others.

It can be time-consuming, mainly if the observed variables do not occur frequently or last for long periods.

Studying the lifestyle of elephants in their birth and pregnancy stages. Their pregnancy can last around two years and they give birth around every four years.

Situation	Method Used for Collecting Data
A	Experiment
B	Observational Study
C	Survey

Situation

Method Used for Collecting Data

Experiment

Observational Study

Survey

	Future Research Problems
A	The way people consume movies is changing rapidly. The movie industry wants to study what viewing options people would prefer to watch movies over the next decade.
B	More households have begun gardening tomatoes than ever before. A company that makes garden soil wants to test whether their newly developed fertilizer improves the growth of tomatoes.
C	Before building a new jungle gym, a local park wants to count how many toddlers are using the current jungle gym. The design of the jungle gym will depend on this finding.

Future Research Problems

The way people consume movies is changing rapidly. The movie industry wants to study what viewing options people would prefer to watch movies over the next decade.

More households have begun gardening tomatoes than ever before. A company that makes garden soil wants to test whether their newly developed fertilizer improves the growth of tomatoes.

Before building a new jungle gym, a local park wants to count how many toddlers are using the current jungle gym. The design of the jungle gym will depend on this finding.

Treatment Group	Control Group
A group of plants that will be exposed to the fertilizer	A group of plants that will not be exposed to the fertilizer

Treatment Group

Control Group

A group of plants that will be exposed to the fertilizer

A group of plants that will not be exposed to the fertilizer

Situation	Appropriate Method for Collecting Data
A	Survey
B	Experiment
C	Observational Study

Situation

Appropriate Method for Collecting Data

Survey

Experiment

Observational Study

Biased Question	Unbiased Question
How likely are you to recommend medieval fantasy books and movies ?	How likely are you to recommend medieval fantasy books?
How likely are you to recommend medieval fantasy movies?
How much do you enjoy playing video games and watching movies?	How much do you enjoy playing video games?
How much do you enjoy watching movies?

Biased Question

Unbiased Question

How likely are you to recommend medieval fantasy books and movies ?

How likely are you to recommend medieval fantasy books?

How likely are you to recommend medieval fantasy movies?

How much do you enjoy playing video games and watching movies?

How much do you enjoy playing video games?

How much do you enjoy watching movies?

Biased Order
$1 .$ What is your least favorite math subject?
$2 .$ How much do you enjoy math?

Biased Order

1 .

What is your least favorite math subject?

2 .

How much do you enjoy math?

Unbiased Order
$1 .$ How much do you enjoy math?
$2 .$ What is your least favorite math subject?

Unbiased Order

1 .

How much do you enjoy math?

2 .

What is your least favorite math subject?

Biased Question	Unbiased Question
Do you always consume fast food?	How often do you consume fast food?
How healthy are you?	How would you rate your health?

Biased Question

Unbiased Question

Do you always consume fast food?

How often do you consume fast food?

How healthy are you?

How would you rate your health?

Option	Biased?
How good are the cafeteria facilities?	Yes
Do you bring your own lunch or buy it at the school's cafeteria?	No
How do you consider the taste and appearance of the cafeteria's food?	Yes
Do you prefer hot drinks or cold drinks?	No

Option

Biased?

How good are the cafeteria facilities?

Yes

Do you bring your own lunch or buy it at the school's cafeteria?

How do you consider the taste and appearance of the cafeteria's food?

Yes

Do you prefer hot drinks or cold drinks?

Solution	Description
Placebo	A fake treatment that is given to the control group, but it is harmless. An example would be receiving a sugar pill rather than prescription medication.
Blinding	This refers to preventing the groups or the researcher from knowing which one is the control and the treatment group.

Solution

Description

Placebo

A fake treatment that is given to the control group, but it is harmless. An example would be receiving a sugar pill rather than prescription medication.

Blinding

This refers to preventing the groups or the researcher from knowing which one is the control and the treatment group.

	Treatment Group	Control Group
A	People who will not follow the intermitent fasting.	People who will follow the intermittent fasting.
B	People who will follow the intermittent fasting.	People who will not follow the intermitent fasting.

Treatment Group

Control Group

People who will not follow the intermitent fasting.

People who will follow the intermittent fasting.

People who will not follow the intermitent fasting.

Treatment Group	Control Group
People who will follow the intermittent fasting	People who will not follow the intermitent fasting

Treatment Group

Control Group

People who will follow the intermittent fasting

People who will not follow the intermitent fasting

Control Group	Treatment Group
The first $50$ people registered via social media	The latter $50$ people registered via social media

Control Group

Treatment Group

The first

50

people registered via social media

The latter

50

people registered via social media

Control Group	Treatment Group
$50$ people selected at random from a representative sample	$50$ people selected at random from a representative sample

Control Group

Treatment Group

50

people selected at random from a representative sample

50

people selected at random from a representative sample

Control Group	Treatment Group
A group of $50$ people between $15$ and $25$	A group of $50$ people between $15$ and $25$

Control Group

Treatment Group

A group of

50

people between

15

and

25

A group of

50

people between

15

and

25

Experiment
Treatment Group	Control Group
Students who will play video games during some period after school	Students who will not play any video games during the research

Experiment

Treatment Group

Control Group

Students who will play video games during some period after school

Students who will not play any video games during the research

Of the few hundred students at East Junior High, a survey in which

150

students participated , the findings showed that

100

read at least five books per year.

Tadeo statement and Ramsha statement of the population and sample of the study in EJH.'

Who is correct?

We need to determine if Tadeo or Ramsha correctly stated the population and the sample used in the given study. To discover who is correct, we will use a Venn diagram to represent the target population of this situation. Because the survey was taken only among students of EJH, the population consists of all the students of EJH.

Next, since 150 were surveyed, we can assume that not everyone in the school took part in the survey since the given information says that EJH has a few hundred students. We can add this information to our diagram.

Observing the diagram, note that the surveyed students is a subset of the student body. This means that the surveyed students are a sample of the students at EJH. Moreover, it is given that 100 out of the 150 students surveyed read at least 5 books per year. Let's add this information to our Venn diagram.

From the diagram, we can conclude the following.

The population consists of all the students in EJH.
The sample consists of the 150 students, 100 of which read at least five books per year.

This corresponds to what Ramsha stated. Therefore, Ramsha is correct.

Read each study and identify whether it is based on a parameter or a statistic.

The mean weight of all students in the fourth grade.

The standard deviation of ages of residents in Alabama.

The median annual income of all employees at a company.

We are asked to find if the given study is based on a parameter or a statistic. Recall that measures obtained from a population are called parameters, and measures obtained from a sample are statistics. In this case, the mean weight was obtained from all students of fourth grade, which represents the population. \begin{gathered} \underline\textbf{Population}\\ \text{The students of fifth grade} \end{gathered} Therefore, because the population was considered, the mean weight is a parameter.

In a similar fashion, we can determine whether the standard deviation of ages of residents in Alabama is a parameter or a statistic. In this situation, the population is given by all residents of Alabama. Population Residents in Alabama Note that although it is not directly stated, it is almost impossible to survey every single resident of Alabama. This indicates that the measure was obtained from a sample, which makes it a statistic.

Finally, let's analyze the study about the median annual income of all employees at a company. In this study, the population or group of interest is all employees at the company. \begin{gathered} \underline\textbf{Population}\\ \text{Employees of the Company} \end{gathered} Therefore, because the median income was calculated from all the employees in the company, it is a parameter.

Classify each study according to the data collection method that fits best.

A tech company wants to study whether a new chip implemented in the computers they produce decreases the CPU temperature compared with the computers that use the old chips.

It seems that readers' habits are changing in the last decade. For this reason, a publishing company wants to investigate if readers prefer printed or electronic books.

In a previous study, the movie industry found that people usually complain about the time spent at the ticket counter. Recently, a series of automatic ticket machines were installed in some movie theaters. The movie industry wants to know if people prefer using the ticket counter or the automatic machines.

We need to identify which data collection method will fit the given study. We can note that the study's purpose is to compare the temperature of the computers with the new chip against the temperature of the computers without the new chip.

Study's Purpose
Determine the temperature performance of the CPU with the new chip.

Computers with the new chip can be considered as a treatment group where the treatment is having the new chip implemented. At the same time, computers with the old chips represent the control group, because they did not receive the treatment — the new chip.

Treatment Group	Control Group
The group of computers that will get the new chip.	The group of computers that will not get the new chip.

This means that in this case, the study will be performed by conducting an experiment.

For this situation, we want to know readers' preferences between electronic and printed books.

Study's Purpose
Investigate readers' preferences between electronic and printed books.

We can address a group of readers by asking them a series of questions to know their preferences. This means that this study can be done by using a survey.

In this study, we want to determine customers behaviors regarding the use of the ticket counter and the automatic ticket machine.

Study's Purpose
Analyze customers' preferences regarding the ticket counter and the automatic ticket machine.

Although we may think that two groups are needed because the automatic machine can be thought of as a treatment, it is not easy to divide customers into a treatment and control group. Conversely, we can observe which option of buying tickets people entering the movie facilities will choose. Therefore, this is an observational study.

For each study, determine which option appropriately name the sample and the population.

A factory overseer chooses $180$ televisions at random from those produced last week. Then he tests if the televisions work.

	Population	Sample
A	The population is all televisions produced at the factory.	The sample is the televisions produced last week.
B	The population is the televisions produced at the factory last week.	The sample is the $180$ televisions selected.
C	The population is all televisions in the world.	The sample is all televisions produced at the factory.

The owners of an ice cream parlor located in a mall wanted to predict what additional ice cream flavor options would sell well. They selected $100$ mall visitors at random and surveyed them to get their opinions.

	Population	Sample
A	The population is all people who buy ice cream in the ice cream parlor.	The sample is the $100$ visitors selected.
B	The population is the $100$ visitors selected.	The sample is the ice cream flavors.
C	The population is all people who visit the mall.	The sample is the $100$ visitors selected.

We are asked to identify which option describes the population and sample of the study correctly. Let's begin by identifying the study's purpose to determine which option is correct. In this case, the factory overseer wants to know if the produced televisions last week work properly.

Study's Purpose
Determine if the televisions produced last week are working well.

This indicates that the population of the study is given by all televisions produced at the factory last week. Additionally, the sample will be the selected televisions from those produced last week.

Population	Sample
The population is the televisions produced at the factory last week.	The sample is the 180 televisions selected.

Therefore, option B correctly defines the sample and the population of the study.

We will follow a similar process to determine which option defines the sample and the population of the study about the ice cream parlor. In this case, the owners want to predict what additional ice cream flavor options would sell well.

Study's Purpose
Determine ice cream flavors that would sell well.

Now, because the ice cream parlor is in the mall, the population must be all the visitors of the mall that are potential clients of the ice cream parlor. Moreover, the sample is the 100 visitors selected.

Population	Sample
The population is all people who visit the mall.	The sample is the 100 visitors selected.

This is given in option C. Be aware that someone thinks that population could be the customers of the ice cream parlor. However, limiting the population to this group will ignore potential clients.

Investigate if the data was collected from a population or a sample of the population.

The height of every player of Spain's La Liga — the top professional soccer division of Spain.

A survey asked

100

people from Virginia about the time spent watching TV on weekends.

The height of every student at a North Junior High.

We want to determine whether the population or a sample of the population was used when collecting the data. Recall that the population consists of all the members of a group of interest. If it is impractical to collect data from every member of the population, a representative sample, a subset of the population, is used instead.

With this information, let's consider the given information.

The height of every player of Spain's La Liga — the top professional soccer division of Spain.

In this case, the population of interest is Spain's top soccer division. Because all players were considered, the data was obtained from the population.

Consider the given information.

A survey asked 100 people about the time spent watching TV on weekends.

Note that in this situation, the population of interest is the residents of the state of Virginia, and because only a few of them were surveyed, the data was collected from a sample.

Let's now take a look at the third study.

The height of every student at a North Junior High.

Here, the population of interest is the student body of NJH. Since every student was considered, the whole population was used.

	14 Theory slides
	9 Exercises - Grade E - A
	Each lesson is meant to take 1-2 classroom sessions

Designing a Study

Catch-Up and Review

Hint

Solution

Answer

Hint

Solution

Hint

Solution

Option A

Option B

Option C

Putting Them All Together

Hint

Solution

Situation A

Situation B

Situation C

Put Them All Together

Double-Barreled Questions

Question Order

Leading Questions

Hint

Solution

Cafeteria Facilities

Bringing or Buying Lunch

Taste and Appearance of Cafeteria's Food

Hot Drinks vs Cold Drinks

Put Them All Together

Group Selection Bias

Performance Bias

Hint

Solution

Social Media Procedure

Random Selection Procedure

Selecting People From 15 to 25 Years Old

Answer

Hint

Solution

Designing a Study

Recommended exercises

Selecting People From $15$ to $25$ Years Old