PA
Pre-Algebra View details
4. What is Statistics
Continue to next lesson
Lesson
Exercises
Tests
Chapter 12
4. 

What is Statistics

Statistics is the study of collecting, analyzing, and interpreting data. It involves asking statistical questions and gathering information through various sampling methods. Understanding how to choose between different types of samples, such as simple random, systematic, convenience, and voluntary response samples, helps in gathering reliable data. Making valid inferences is the next crucial step, ensuring that conclusions drawn from data accurately reflect the population being studied. These concepts are important in many fields, such as research, business, and government, where data-driven decisions are needed. Understanding statistics enables better decision-making by providing insight into trends, patterns, and relationships within data, making it a powerful tool for informed choices.
Show more expand_more
Problem Solving Reasoning and Communication Error Analysis Modeling Using Tools Precision Pattern Recognition
Lesson Settings & Tools
18 Theory slides
8 Exercises - Grade E - A
Each lesson is meant to take 1-2 classroom sessions
What is Statistics
Slide of 18
Data is crucial in understanding many facets of life such as people's opinions, preferences and predicting future weather conditions. Statistics play a vital role in every stage of data analysis — ranging from collecting reliable data to drawing meaningful conclusions. This lesson aims to provide an overview of statistics.

Catch-Up and Review

Here are a few recommended readings before getting started with this lesson.

Challenge

Are Five-Star Ratings Reliable?

Dylan is fascinated with statistics. He learns that data is constantly collected about people, both voluntarily and not so much. Data allows companies to understand and make predictions about customers. One widespread form of data collection is when apps post messages asking for five-star rankings.
Five Star Rating Applet
These messages are insistent and interrupt customers while using the app. Most people who respond tend to rate the app as one star or five stars, rarely anything in between. Are conclusions gathered from such rankings reliable? Why is that the case?
Discussion

Statistics and Statistical Questions

Statistics is a set of tools and techniques used to collect, organize, and interpret information. These pieces of information are also known as data, which statisticians analyze. DataAnalysis.jpg One of the main ways to collect data consist of asking questions. There are especially useful questions that are known as statistical questions.

Concept

Statistical Question

A question is considered a statistical question when it is designed to anticipate and account for a variety of answers. Consider the differences between the following statistical and non-statistical questions.

Non-statistical Question Statistical Question
What is Davontay's height? What are the heights of the students in Davontay's class?
What is the total population of the United States? In the United States, what is each individual state's population?
What was Paulina's last Pre-Algebra test score? What are Paulina's test scores from previous month?
Non-statistical questions are recognized as having only one true answer. Statistical questions have many answers.
Discussion

Asking Questions to Gather Data

Data is gathered with the goal of interpreting it into meaningful information. Asking a set of questions to gather this data — usually statistical questions — leads to a clearer picture when interpreting the data. Such sets of questions are called surveys.

Concept

Survey

A statistical study that collects data from participants through a questionnaire — a series of questions — is called a survey.The answers obtained in a survey aim to collect data about the characteristics, behaviors, or opinions of the members of the population.

Survey.jpg

Questions must be clear to avoid confusion in the participant. The participants answer the questions while the researcher is sure to never interfere. Here are a few common types of surveys.

  • Online: This is one of the most cost effective ways to conduct a survey. Individuals are able to respond quickly as they can do them via a website or through email.
  • Phone Call: This method requires interviewing participants through phone calls. It could require more time and money than online surveys because of service costs and number of calls made.
  • Face-to-face: The researcher conducts the survey in-person with the participant. This choice could be expensive and time-consuming.
When done correctly, the method chosen does not have a significant influence on the results. In any event, it is critical that the questions are clear, mitigate any ambiguity, and serve the purpose of the study not the researcher's expectation.
Discussion

What is the Goal of Statistics?

Statistics are used to analyze collected data to help understand how different things work or how different people think. The goal of statistics is to understand the characteristics of a population.

Concept

Population

In statistics, a population consists of all members of a group of interest. Populations can vary in size and include people, animals, plants, or objects. Since studying every member of a population is impractical, a representative subset called a sample is used instead. The sample is used to represent or make assumptions about the population.

Population and sample
For example, suppose there is a study that examines high school students in Boston and their attitudes towards mathematics. In this case, the population consists of all high school students in Boston. Since it is impractical to study every student, a sample can be taken by using sampling methods.
Example

Identifying the Population and the Sample

Dylan is hyped to learn more about statistics but he is having some issues. He cannot determine which set represents the population and which represents a sample of the population. Dylan goes to his teacher for help.

Dylan listening to his math teacher

She gives Dylan a few example problems that will help him learn about sample versus population.

a Pair the following.
b Pair the following.
c Pair the following.

Hint

a The sample is always taken from the population.
b The sample is always taken from the population.
c The sample is always taken from the population.

Solution

a Since the sample is always part of the population, identifying the larger group determines which of the sets represents the population. In this case, Dylan's classroom is part of the students of his school.

The Students of Dylan's School &→ & Population Dylan's Classmates &→ & Sample

b Similar to what was done in Part A, it is important to determine which set is part of the other. In this case, the workers at the local City Mall are inhabitants of that city and not the other way. Therefore, the population is the set of the city's inhabitants.

The Inhabitants of a city &→ & Population Workers of the local city mall &→ & Sample

c Finally, a league is made of several teams. Therefore, the members of a team are a sample of all the members of a league. The members of a leauge &→ & Population The members of a team &→ & Sample


Example

How Many Games With More Than 30 Points?

Dylan's loves playing basketball. Dylan-Balling.png A reason for Dylan's interest in statistics is that he wants to understand the stats of his favorite players over the course of a season. He decides to try out some statistics for practice and notes the scores of his favorite player, Savant Saucey, in games selected at random during the season.

Scores
45 27
40 33
22 46
29 9
18 20
34 15
a Help Dylan find the probability that his Savant Saucey scores 30 or more points during a game. Write the answer as a fraction.
b Savant Saucey will play in about 60 games during the season. Use an equivalent ratio to help Dylan predict how many of these games his favorite player will score more than 30 points. Round to the nearest whole number.

Hint

a The probability is found by dividing the number of games with more than 30 points by the total number of games in the sample.
b Write an equivalent ratio by dividing the number of estimated games with more than 30 points by the total number of games.

Solution

a The probability that Dylan's favorite player scores 30 points or more during a game is obtained by dividing the number of games in which the player scored more than 30 points by the number of games.

p = Games with score ≥ 30/Total number of games This number of games can be identified by noting the number of times the score is 30 or higher in the table.

Scores
45 27
40 33
22 46
29 9
18 20
34 15

Dylan's favorite player scored more than 30 points on 5 of the 12 games. With these numbers, we can write the probability. p = 5/12

b Two ratios are equivalent if they result in the same value. Let g be the predicted number of games where Dylan's favorite player scores more than 30 points. The probability 512 has to be equal to g divided by the total number of games 60 to write an equivalent ratio for the prediction.
5/12 = g/60 The fraction on the left-hand side of the equation can be rewritten to find the value of g. First, notice that multiplying 12 by 5 results in 60. For the ratios to be equivalent, then 5 multiplied by 5 has to be equal to g.
5/12 = g/60
25/60 = g/60
25 = g
g = 25
Dylan's prediction is that his favorite player will score more than 30 points in 25 of the 60 games. Not bad!
Example

Getting Buckets

Most shots in basketball are worth two points. However, there are other shots with different values. For example, a free throw is worth one point. These are uncontested shot attempts that players are awarded if they are fouled while shooting.

Player getting fouled
a Dylan's favorite player, Savant Saucey is a certified bucket getter. Saucey was successful on 92 % of their free throw attempts during the season. If Saucey attempted 300 free throws, how many did they make?
b In addition to free throws, there are shots that are worth three points called three pointers. They are taken from about 23 feet or further from the basket. Savant Saucey was successful on 42 % of 750 attempted three pointers. How many points did Savant score from three pointers? Round to the nearest integer.

Hint

a Find the 92 % of 300 by multiplying.
b Find the 42 % of 750 by multiplying.

Solution

a The probability that Savant Saucey makes a free throw is 92 %. Since they took 300 attempts, the amount of successful makes is a result of finding 92 % of 300. This percent is obtained by multiplying 300 by 92100=0.92.

0.92 * 300 = 276 This means that, of the 300 attemps Savant Saucey took, they were successful on 272. That is outstanding.

b This part is solved similarly to how Part A was solved. Since Savant Saucey took 750 three-point shot attempts, the number of successful shots is the result of finding the 42 % of 750. This is done by multiplying.

0.42 * 750 = 315 This means that Savant Saucy was successful on 315 three point shots. Then, since each shot is worth three points, the total number of points is found by multiplying 315 by 3. 315* 3 = 945 Savant Saucey scored 945 points in the season just from three point shots. That is very good!


Discussion

Selecting a Sample

A key point in selecting a sample is to select a sample that most reflects the population. These samples are called representative samples and help produce meaningful interpretations.

Concept

Representative Sample

A sample that accurately reflects the characteristics of the population is called a representative sample.

Selecting sample from a population

In a representative sample, if x parts of the population have a certain characteristic, approximately x parts of the sample will share the same characteristic. As an example, consider a sample of cats and dogs selected from all the cats and dogs at a pet shop.

Population and sample of cats and dogs

The figure shows cats represent 60 % of the population and 60 % of the sample. Similarly, dogs represent 40 % of the population and 40 % of the sample. In this case, the sample is then representative. Note that in a representative sample, every subgroup of the population is represented.

Extra

Relation with Bias

It is possible that samples reflect biases. In most cases, unbiased samples result in representative samples. On the other hand, biased samples usually result in samples that are not representative of the population.

Discussion

Unbiased Samples

A sample is said to be unbiased if every member of the population has the same probability of being selected. The most common of these types are called Simple Random Samples.

Concept

Simple Random Sample

In a simple random sample, each member of a population is equally likely to be selected as part of the sample. Consider an example where a researcher performs the following procedure. |c| There are $20$ people, each assigned a unique number, who are surveyed. Of those $20$ people, $4$ need to be randomly selected as a sample. The researcher writes each unique number on a paper and places them in a bag.The researcher then blindly selects $4$ papers out of the bag.

The researcher's way of choosing a sample means that each participant is equally likely to be selected as part of the sample. The following applet shows a generated example of simple random samples.
Generate a random sample
To generate a simple random sample, each member of a population can be assigned a unique ID — for example, a whole number. Then, a random number generator can be used to generate IDs until the desired sample size is satisfied.

Extra

Biased or Unbiased Sampling

A simple random sample is an unbiased sample because it involves selecting members from a population randomly, guaranteeing that each member of the population has an equal chance of being included. This is usually representative as it tends to have the same characteristics as the population. 2 &Unbiased Sampling:&& Members of a population are selected randomly. &Biased Sampling:&& Members of a population are hand chosen, for example. An example of biased surveying would be a coach who asks only their favorite players about the effectiveness of their team's strategy. Just imagine if only the stars of the team are allowed to give their opinion to the coach. However, other instances might call for biased sampling such as a study about how Olympic athletes perform under pressure.

Discussion

Systematic Samples

Another unbiased sample type is a systematic sample.

Concept

Systematic Sample

In a systematic sample, the members of a population are ordered randomly or in a random-like way. The sample size is predetermined and selected among the members in a specified interval. The intervals are determined by dividing the population size by the sample size. Consider an example. |c| Five customers are surveyed about the quality of a fragrance shampoo.

A sample of 5 people is selected from a population of 20 people using systematic sampling.
A systematic sample is often used when a complete list of the population is available. Different samples can be created by varying the starting point when using systematic sampling. This gives the surveyors the ability to check if the conclusions hold across the samples.

Extra

Biased or Unbiased Sampling

Systematic samples can be either biased or unbiased.

Biased: &If the initial order of the population is biased, &the sample will also be biased. Unbiased: &If the members of the population are ordered &randomly without a cyclic pattern, it is &expected that the systematic sampling &will also be unbiased.

An example of a bias would be if the population is arranged in a cyclical pattern that matches the sampling interval. That can lead to a group being preferred over other groups, which leads to bias in the sample. It is important to remember that unbiased samples are better suited to generate a representative sample.

Discussion

Biased Samples

In biased samples, one part of the population is preferred over the others in some way. These type of samples are usually not representative samples, but are usually more practical than unbiased samples. For example, consider convenience samples.

Concept

Convenience Sample

In a convenience sample, members of a population are selected to be in a sample based on convenience or their availability to the researchers. Consider an example. |c| A researcher wants to study all wolf subspecies in a forest. However, they only have the funds to study those that frequently visit a certain area. This is an example of convenience sampling because the researcher is selecting wolves who are conveniently available and in close proximity.

This type of sampling is often used when researchers have limited resources or are under certain time constraints.

Extra

Biased or Unbiased Sampling

Convenience sampling can lead to a biased sample since the researchers choose sample members that are easily available to them. Some groups of people in the population may not be represented in the sample because they are not easily accessible, causing the sample to not be representative of the population.

Discussion

Voluntary Response Sample

In a self-selected sample, the members of the sample are the people who are willing to participate. Since people participate voluntarily in these samples, the samples are also called voluntary response samples. Consider an example. |c| A survey about internet shopping is posted online. People volunteer to participate or simply ignore it. The following applet shows a certain number of people who volunteer to participate in the survey.
choose self-selected sample

Extra

Biased or Unbiased Sampling

A self-selected sample is a biased sample because people with strong opinions, either positive or negative, about the topic studied are more likely to volunteer. Also, people who are interested in the topic being studied may be more likely to participate, while those who are not interested may refuse to participate.

As a result, such a sample is not representative of the population because it underrepresents people with neutral opinions about the topic or who are not interested in it.

Discussion

Making Inferences With the Results of a Survey

As previously mentioned in the lesson, one of the main goal of statistics is to make inferences about the characteristics of a population using samples. It is important to have in mind that not every inference is correct. However, valid inferences are very likely to be correct.

Concept

Valid Inference

A valid inference is a conclusion drawn from a sampling of a population that is highly likely to be true. This type of inference is based on representative samples and supported by sufficient data. Consider an example where a survey was conducted on the people of a city asking for their heights.

Here are more details of the survey and what it found. The survey was conducted in a simple random sample of $250$ inhabitants of a city. It found that the average height of the people in the sample is $5$ feet $7$ inches. This is enough information to make an inference about the average height of all inhabitants of the city. The average height of all inhabitants of the city is about $5'7.$ This inference is valid because of the following characteristics of the study.

  1. A simple random sample is usually unbiased.
  2. The number of participants 250 is a relevant sample size.
It should be noted that the sample size should be as big as possible. For small populations, a relevant sample size is about 10 % of the population. However, with big populations this can be hard or impossible to achieve, so, as long as the sample size is around 300, the sample can lead to valid inferences if it is unbiased.
Example

How Many Students at School Like Basketball?

Dylan is doing great in statistics class. Miss Jackson assigns him to take another survey of his choice. Wondering if the students in his school like basketball as much as him, he decides to ask in a survey. He is nervous his classmates will say things like Basketball? No way! and Eww.

Dylan's classmates showing disdain toward basketball.

Dylan arrives to school early and asks every fifth student entering the school what their interest level is, if any, in basketball. By the end of the morning, Dylan surveyed 54 students. The results showed that almost 30 % of the students love basketball!

a Select which type of sample Dylan selected to take the survey.
b Is the conclusion a valid inference? Consider that there are about 500 students in Dylan's school.

Hint

a Remember how each type of sample is taken.
b Is the sample biased? Did Dylan gather enough data?

Solution

a Dylan selected a sample by asking every fifth student. This method can be compared with each type of sample to verify which is the one that Dylan used.
Sample Characteristics
Simple random sample The participants are selected from the population using an identifier. Every member has the same chance of being selected.
Systematic sample The members of the population are ordered randomly. Then, each participant is selected in a specified interval.
Convenience sample Each participant is selected based on availability or convenience to the researcher.
Voluntary response sample Each participant voluntarily chooses to be a part of the sample.

The order in which the students arrive to school is close to random. In addition to this, Dylan selected the participants in an interval of five. Therefore, it can be determined that Dylan chose a systematic sample.

b Since the order in which students arrive is random, every student has the same chance of being selected. This makes for an unbiased sample. Dylan also selected a sample of 54 students. This sample can compared to 10 % of the number of students of the school.

10 % of 500 = 50

Since 54 is a little more than 10 % of the population, the sample is big enough. These characteristics make valid inferences. Therefore the answer is yes.
Example

Favorite Basketball Team

Dylan is going to a game of his favorite basketball team, the Fighters.

Basketball game

There are three different teams in the state where Dylan lives. Wondering which team is the most popular in the state, Dylan conducted a survey on the day of the game. He asked several people in the Fighter's arena which was their favorite team from the state. Dylan's results can be represented with a pie chart.

a Select which type of sample Dylan selected to take the survey.
b Is the conclusion a valid inference?

Hint

a Remember how each type of sample is taken.
b Is the sample biased? Did Dylan gather enough data?

Solution

a Dylan selected a sample by asking people in the same arena as him, the Fighters arena. This method can be compared with each type of sample to verify which is the one that Dylan used.
Sample Characteristics
Simple random sample The participants are selected from the population using an identifier. Every member has the same chance of being selected.
Systematic sample The members of the population are ordered randomly. Then, each participant is selected in a specified interval.
Convenience sample Each participant is selected based on availability or convenience to the researcher.
Voluntary response sample Each participant voluntarily chooses to be a part of the sample.

Dylan selected the participants of the sample in the same location as him. Therefore, it can be determined that Dylan chose a convenience sample.

b Dylan wants to know how popular are the Fighters in the state. However, most people that attend the Fighters arena will be Fighters fans. This makes a biased sample. Therefore, the answer is no.
Example

Ordering Jerseys

Dylan's uncle sells Fighters' jerseys at an apparel store. He knows that many people buy jerseys during the playoffs. He wants to know which he should order based on popularity. He randomly selected a simple random sample of 84 customers and conducted a survey asking who their favorite player is. The results are in the table.

Player Votes from Customers
Savant Saucey 42
Elgin Baylor 25
Nate Thurmond 17

Dylan's uncle wants to order 420 new jerseys to sell during the playoffs. He already made a great table and now he asks Dylan to run some analysis that will help him make the correct order of jerseys.

a Do the results from Dylan's uncle's survey can make a valid inference?
b How many jerseys from Savant Saucey should be ordered based on the results from the survey?

Hint

a Is the survey biased?
b What is the probability that a customer prefers Savant Saucey?


Solution

a Every member of the population of customers has the same probability of being selected on a simple random sample. Therefore, the survey is not biased. Since the sample is not biased and was conducted on enough people, the survey can make a valid inference.
b The proportion of people who prefer Savant Saucey is needed before determining how many jerseys to order. This proportion can be found by dividing the customers that prefer this player by the total number of customers surveyed. The total number of customers surveyed can be obtained by adding the votes for each player.
42+25+17 = 84 The proportion needed is the same as the probability that Savant Saucey is the favorite player of a customer. Since the total number of people surveyed is 84, there is enough information to find this probability.
p= Number of favorable outcomes/Number of people surveyed
p = 42/84
p = 1/2
p= 0.5
A probability of 0.5 indicates that there is a 50 % chance that Savant Saucey is the favorite player of a customer. Knowing this, 50 % of the 420 jerseys should of Savant Saucey. The exact number can be found by multiplying 0.5 and 420. 0.5* 420 = 210 Therefore, 210 jerseys of Savant Saucey should be ordered.
Closure

Are Five Star Ratings Biased?

Refer to the challenge given at the beginning of the lesson. Apps often ask customers to give five star reviews. The customers are the population that the app owners want to make inferences about. The method that the app owners use to select a sample can be determined by comparing it the following sample types.

Sample Characteristics
Simple random sample The participants are selected from the population using an identifier. Every member has the same chance of being selected.
Systematic sample The members of the population are ordered randomly. Then, each participant is selected in a specified interval.
Convenience sample Each participant is selected based on availability or convenience to the researcher.
Voluntary response sample Each participant voluntarily chooses to be a part of the sample.
Since customers decide whether to rank the app or not, the sample is a voluntary response sample. These kind of samples are usually biased. That would explain why the majority of participants rank it as one star or five stars, rarely anything in between. This means that the conclusions arrived with the rankings are not valid inferences.


What is Statistics
Exercise 1.1
>
2
e
7
8
9
×
÷1
=
=
4
5
6
+
<
log
ln
log
1
2
3
()
sin
cos
tan
0
.
π
x
y