{{ 'ml-label-loading-course' | message }}
{{ toc.name }}
{{ toc.signature }}
{{ tocHeader }} {{ 'ml-btn-view-details' | message }}
{{ tocSubheader }}
{{ 'ml-toc-proceed-mlc' | message }}
{{ 'ml-toc-proceed-tbs' | message }}
Lesson
Exercises
Recommended
Tests
An error ocurred, try again later!
Chapter {{ article.chapter.number }}
{{ article.number }}. 

{{ article.displayTitle }}

{{ article.intro.summary }}
Show less Show more expand_more
{{ ability.description }} {{ ability.displayTitle }}
Lesson Settings & Tools
{{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }}
{{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }}
{{ 'ml-lesson-time-estimation' | message }}
Data is crucial in understanding many facets of life such as people's opinions, preferences and predicting future weather conditions. Statistics play a vital role in every stage of data analysis — ranging from collecting reliable data to drawing meaningful conclusions. This lesson aims to provide an overview of statistics.

Catch-Up and Review

Here are a few recommended readings before getting started with this lesson.

Challenge

Are Five-Star Ratings Reliable?

Dylan is fascinated with statistics. He learns that data is constantly collected about people, both voluntarily and not so much. Data allows companies to understand and make predictions about customers. One widespread form of data collection is when apps post messages asking for five-star rankings.
Five Star Rating Applet
These messages are insistent and interrupt customers while using the app. Most people who respond tend to rate the app as one star or five stars, rarely anything in between. Are conclusions gathered from such rankings reliable? Why is that the case?
Discussion

Statistics and Statistical Questions

Statistics is a set of tools and techniques used to collect, organize, and interpret information. These pieces of information are also known as data, which statisticians analyze. DataAnalysis.jpg One of the main ways to collect data consist of asking questions. There are especially useful questions that are known as statistical questions.

Concept

Statistical Question

A question is considered a statistical question when it is designed to anticipate and account for a variety of answers. Consider the differences between the following statistical and non-statistical questions.

Non-statistical Question Statistical Question
What is Davontay's height? What are the heights of the students in Davontay's class?
What is the total population of the United States? In the United States, what is each individual state's population?
What was Paulina's last Pre-Algebra test score? What are Paulina's test scores from previous month?
Non-statistical questions are recognized as having only one true answer. Statistical questions have many answers.
Discussion

Asking Questions to Gather Data

Data is gathered with the goal of interpreting it into meaningful information. Asking a set of questions to gather this data — usually statistical questions — leads to a clearer picture when interpreting the data. Such sets of questions are called surveys.

Concept

Survey

A statistical study that collects data from participants through a questionnaire — a series of questions — is called a survey.The answers obtained in a survey aim to collect data about the characteristics, behaviors, or opinions of the members of the population.

Survey.jpg

Questions must be clear to avoid confusion in the participant. The participants answer the questions while the researcher is sure to never interfere. Here are a few common types of surveys.

  • Online: This is one of the most cost effective ways to conduct a survey. Individuals are able to respond quickly as they can do them via a website or through email.
  • Phone Call: This method requires interviewing participants through phone calls. It could require more time and money than online surveys because of service costs and number of calls made.
  • Face-to-face: The researcher conducts the survey in-person with the participant. This choice could be expensive and time-consuming.
When done correctly, the method chosen does not have a significant influence on the results. In any event, it is critical that the questions are clear, mitigate any ambiguity, and serve the purpose of the study not the researcher's expectation.
Discussion

What is the Goal of Statistics?

Statistics are used to analyze collected data to help understand how different things work or how different people think. The goal of statistics is to understand the characteristics of a population.

Concept

Population

In statistics, a population consists of all members of a group of interest. Populations can vary in size and include people, animals, plants, or objects. Since studying every member of a population is impractical, a representative subset called a sample is used instead. The sample is used to represent or make assumptions about the population.

Population and sample
For example, suppose there is a study that examines high school students in Boston and their attitudes towards mathematics. In this case, the population consists of all high school students in Boston. Since it is impractical to study every student, a sample can be taken by using sampling methods.
Example

Identifying the Population and the Sample

Dylan is hyped to learn more about statistics but he is having some issues. He cannot determine which set represents the population and which represents a sample of the population. Dylan goes to his teacher for help.

Dylan listening to his math teacher

She gives Dylan a few example problems that will help him learn about sample versus population.

a Pair the following.
b Pair the following.
c Pair the following.

Hint

a The sample is always taken from the population.
b The sample is always taken from the population.
c The sample is always taken from the population.

Solution

a Since the sample is always part of the population, identifying the larger group determines which of the sets represents the population. In this case, Dylan's classroom is part of the students of his school.
b Similar to what was done in Part A, it is important to determine which set is part of the other. In this case, the workers at the local City Mall are inhabitants of that city and not the other way. Therefore, the population is the set of the city's inhabitants.
c Finally, a league is made of several teams. Therefore, the members of a team are a sample of all the members of a league.


Example

How Many Games With More Than Points?

Dylan's loves playing basketball. Dylan-Balling.png A reason for Dylan's interest in statistics is that he wants to understand the stats of his favorite players over the course of a season. He decides to try out some statistics for practice and notes the scores of his favorite player, Savant Saucey, in games selected at random during the season.

Scores
a Help Dylan find the probability that his Savant Saucey scores or more points during a game. Write the answer as a fraction.
b Savant Saucey will play in about games during the season. Use an equivalent ratio to help Dylan predict how many of these games his favorite player will score more than points. Round to the nearest whole number.

Hint

a The probability is found by dividing the number of games with more than points by the total number of games in the sample.
b Write an equivalent ratio by dividing the number of estimated games with more than points by the total number of games.

Solution

a The probability that Dylan's favorite player scores points or more during a game is obtained by dividing the number of games in which the player scored more than points by the number of games.
This number of games can be identified by noting the number of times the score is or higher in the table.
Scores
Dylan's favorite player scored more than points on of the games. With these numbers, we can write the probability.
b Two ratios are equivalent if they result in the same value. Let be the predicted number of games where Dylan's favorite player scores more than points. The probability has to be equal to divided by the total number of games to write an equivalent ratio for the prediction.
The fraction on the left-hand side of the equation can be rewritten to find the value of First, notice that multiplying by results in For the ratios to be equivalent, then multiplied by has to be equal to
Dylan's prediction is that his favorite player will score more than points in of the games. Not bad!
Example

Getting Buckets

Most shots in basketball are worth two points. However, there are other shots with different values. For example, a free throw is worth one point. These are uncontested shot attempts that players are awarded if they are fouled while shooting.

Player getting fouled
a Dylan's favorite player, Savant Saucey is a certified bucket getter. Saucey was successful on of their free throw attempts during the season. If Saucey attempted free throws, how many did they make?
b In addition to free throws, there are shots that are worth three points called three pointers. They are taken from about feet or further from the basket. Savant Saucey was successful on of attempted three pointers. How many points did Savant score from three pointers? Round to the nearest integer.

Hint

a Find the of by multiplying.
b Find the of by multiplying.

Solution

a The probability that Savant Saucey makes a free throw is Since they took attempts, the amount of successful makes is a result of finding of This percent is obtained by multiplying by
This means that, of the attemps Savant Saucey took, they were successful on That is outstanding.
b This part is solved similarly to how Part A was solved. Since Savant Saucey took three-point shot attempts, the number of successful shots is the result of finding the of This is done by multiplying.
This means that Savant Saucy was successful on three point shots. Then, since each shot is worth three points, the total number of points is found by multiplying by
Savant Saucey scored points in the season just from three point shots. That is very good!


Discussion

Selecting a Sample

A key point in selecting a sample is to select a sample that most reflects the population. These samples are called representative samples and help produce meaningful interpretations.

Concept

Representative Sample

A sample that accurately reflects the characteristics of the population is called a representative sample.

Selecting sample from a population

In a representative sample, if parts of the population have a certain characteristic, approximately parts of the sample will share the same characteristic. As an example, consider a sample of cats and dogs selected from all the cats and dogs at a pet shop.

Population and sample of cats and dogs

The figure shows cats represent of the population and of the sample. Similarly, dogs represent of the population and of the sample. In this case, the sample is then representative. Note that in a representative sample, every subgroup of the population is represented.

Extra

Relation with Bias

It is possible that samples reflect biases. In most cases, unbiased samples result in representative samples. On the other hand, biased samples usually result in samples that are not representative of the population.

Discussion

Unbiased Samples

A sample is said to be unbiased if every member of the population has the same probability of being selected. The most common of these types are called Simple Random Samples.

Concept

Simple Random Sample

In a simple random sample, each member of a population is equally likely to be selected as part of the sample. Consider the following example.
The researcher then puts all of the slips of paper in a bag and draws four of them. Here, each slip of paper has an equal chance of being drawn, so every employee is equally likely to be selected as part of the sample. The applet below shows example simple random samples.
Generate a random sample
To generate a simple random sample, each member of a population can be assigned a unique ID — for example, a whole number. Then, a random number generator can be used to generate IDs until the desired sample size is satisfied.

Extra

Biased or Unbiased Sampling

A simple random sample is an unbiased sample because it involves selecting members from a population randomly. This guarantees that each member of the population has an equal chance of being included in the sample. This type of sample is usually representative because it tends to have the same characteristics as the population.

Discussion

Systematic Samples

Another unbiased sample type is a systematic sample.

Concept

Systematic Sample

In a systematic sample, the members of a population are ordered randomly or in a random-like way. The sample size is predetermined and selected among the members in a specified interval. The intervals are determined by dividing the population size by the sample size. Consider an example.
A sample of people is selected from a population of people using systematic sampling.
A systematic sample is often used when a complete list of the population is available. Different samples can be created by varying the starting point when using systematic sampling. This gives the surveyors the ability to check if the conclusions hold across the samples.

Extra

Biased or Unbiased Sampling

Systematic samples can be either biased or unbiased.

  • If the initial order of the population is biased, the sample will be biased. For example, if the population is arranged in a cyclical pattern that matches the sampling interval, it can lead to a group being preferred over other groups, which leads to bias in the sample.
  • On the other hand, if the members of the population are ordered randomly without a cyclic pattern, it is expected that the systematic sampling will be unbiased.

It is important to remember that unbiased samples are better suited to generate a representative sample.

Discussion

Biased Samples

In biased samples, one part of the population is preferred over the others in some way. These type of samples are usually not representative samples, but are usually more practical than unbiased samples. For example, consider convenience samples.

Concept

Convenience Sample

In a convenience sample, members of a population are selected to be in a sample based on convenience or their availability to the researchers. Consider an example.
This is an example of convenience sampling because the researcher is selecting wolves who are conveniently available and in close proximity.

This type of sampling is often used when researchers have limited resources or are under certain time constraints.

Extra

Biased or Unbiased Sampling

Convenience sampling can lead to a biased sample since the researchers choose sample members that are easily available to them. Some groups of people in the population may not be represented in the sample because they are not easily accessible, causing the sample to not be representative of the population.

Discussion

Voluntary Response Sample

In a self-selected sample, the members of the sample are the people who are willing to participate. Since people participate voluntarily in these samples, the samples are also called voluntary response samples. Consider an example.
The following applet shows a number of people volunteering to participate.
choose self-selected sample

Extra

Biased or Unbiased Sampling

A self-selected sample is a biased sample because people with strong opinions, either positive or negative, about the topic studied are more likely to volunteer. Also, people who are interested in the topic being studied may be more likely to participate, while those who are not interested may refuse to participate.

As a result, such a sample is not representative of the population because it underrepresents people with neutral opinions about the topic or who are not interested in it.

Discussion

Making Inferences With the Results of a Survey

As previously mentioned in the lesson, one of the main goal of statistics is to make inferences about the characteristics of a population using samples. It is important to have in mind that not every inference is correct. However, valid inferences are very likely to be correct.

Concept

Valid Inference

A valid inference is a conclusion drawn from a sampling of a population that is highly likely to be true. This type of inference is based on representative samples and supported by sufficient data. Consider an example where a survey was conducted on the people of a city asking for their heights.

Here are more details of the survey and what it found.
This is enough information to make an inference about the average height of all inhabitants of the city.
This inference is valid because of the following characteristics of the study.
  1. A simple random sample is usually unbiased.
  2. The number of participants is a relevant sample size.
It should be noted that the sample size should be as big as possible. For small populations, a relevant sample size is about of the population. However, with big populations this can be hard or impossible to achieve, so, as long as the sample size is around the sample can lead to valid inferences if it is unbiased.
Example

How Many Students at School Like Basketball?

Dylan is doing great in statistics class. Miss Jackson assigns him to take another survey of his choice. Wondering if the students in his school like basketball as much as him, he decides to ask in a survey. He is nervous his classmates will say things like Basketball? No way! and Eww.

Dylan's classmates showing disdain toward basketball.

Dylan arrives to school early and asks every fifth student entering the school what their interest level is, if any, in basketball. By the end of the morning, Dylan surveyed students. The results showed that almost of the students love basketball!

a Select which type of sample Dylan selected to take the survey.
b Is the conclusion a valid inference? Consider that there are about students in Dylan's school.

Hint

a Remember how each type of sample is taken.
b Is the sample biased? Did Dylan gather enough data?

Solution

a Dylan selected a sample by asking every fifth student. This method can be compared with each type of sample to verify which is the one that Dylan used.
Sample Characteristics
Simple random sample The participants are selected from the population using an identifier. Every member has the same chance of being selected.
Systematic sample The members of the population are ordered randomly. Then, each participant is selected in a specified interval.
Convenience sample Each participant is selected based on availability or convenience to the researcher.
Voluntary response sample Each participant voluntarily chooses to be a part of the sample.

The order in which the students arrive to school is close to random. In addition to this, Dylan selected the participants in an interval of five. Therefore, it can be determined that Dylan chose a systematic sample.

b Since the order in which students arrive is random, every student has the same chance of being selected. This makes for an unbiased sample. Dylan also selected a sample of students. This sample can compared to of the number of students of the school.
Since is a little more than of the population, the sample is big enough. These characteristics make valid inferences. Therefore the answer is yes.
Example

Favorite Basketball Team

Dylan is going to a game of his favorite basketball team, the Fighters.

Basketball game

There are three different teams in the state where Dylan lives. Wondering which team is the most popular in the state, Dylan conducted a survey on the day of the game. He asked several people in the Fighter's arena which was their favorite team from the state. Dylan's results can be represented with a pie chart.

a Select which type of sample Dylan selected to take the survey.
b Is the conclusion a valid inference?

Hint

a Remember how each type of sample is taken.
b Is the sample biased? Did Dylan gather enough data?

Solution

a Dylan selected a sample by asking people in the same arena as him, the Fighters arena. This method can be compared with each type of sample to verify which is the one that Dylan used.
Sample Characteristics
Simple random sample The participants are selected from the population using an identifier. Every member has the same chance of being selected.
Systematic sample The members of the population are ordered randomly. Then, each participant is selected in a specified interval.
Convenience sample Each participant is selected based on availability or convenience to the researcher.
Voluntary response sample Each participant voluntarily chooses to be a part of the sample.

Dylan selected the participants of the sample in the same location as him. Therefore, it can be determined that Dylan chose a convenience sample.

b Dylan wants to know how popular are the Fighters in the state. However, most people that attend the Fighters arena will be Fighters fans. This makes a biased sample. Therefore, the answer is no.
Example

Ordering Jerseys

Dylan's uncle sells Fighters' jerseys at an apparel store. He knows that many people buy jerseys during the playoffs. He wants to know which he should order based on popularity. He randomly selected a simple random sample of customers and conducted a survey asking who their favorite player is. The results are in the table.

Player Votes from Customers
Savant Saucey
Elgin Baylor
Nate Thurmond

Dylan's uncle wants to order new jerseys to sell during the playoffs. He already made a great table and now he asks Dylan to run some analysis that will help him make the correct order of jerseys.

a Do the results from Dylan's uncle's survey can make a valid inference?
b How many jerseys from Savant Saucey should be ordered based on the results from the survey?

Hint

a Is the survey biased?
b What is the probability that a customer prefers Savant Saucey?


Solution

a Every member of the population of customers has the same probability of being selected on a simple random sample. Therefore, the survey is not biased. Since the sample is not biased and was conducted on enough people, the survey can make a valid inference.
b The proportion of people who prefer Savant Saucey is needed before determining how many jerseys to order. This proportion can be found by dividing the customers that prefer this player by the total number of customers surveyed. The total number of customers surveyed can be obtained by adding the votes for each player.
The proportion needed is the same as the probability that Savant Saucey is the favorite player of a customer. Since the total number of people surveyed is there is enough information to find this probability.
A probability of indicates that there is a chance that Savant Saucey is the favorite player of a customer. Knowing this, of the jerseys should of Savant Saucey. The exact number can be found by multiplying and
Therefore, jerseys of Savant Saucey should be ordered.
Closure

Are Five Star Ratings Biased?

Refer to the challenge given at the beginning of the lesson. Apps often ask customers to give five star reviews. The customers are the population that the app owners want to make inferences about. The method that the app owners use to select a sample can be determined by comparing it the following sample types.

Sample Characteristics
Simple random sample The participants are selected from the population using an identifier. Every member has the same chance of being selected.
Systematic sample The members of the population are ordered randomly. Then, each participant is selected in a specified interval.
Convenience sample Each participant is selected based on availability or convenience to the researcher.
Voluntary response sample Each participant voluntarily chooses to be a part of the sample.
Since customers decide whether to rank the app or not, the sample is a voluntary response sample. These kind of samples are usually biased. That would explain why the majority of participants rank it as one star or five stars, rarely anything in between. This means that the conclusions arrived with the rankings are not valid inferences.
Loading content