| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |
| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |
| {{ 'ml-lesson-time-estimation' | message }} |
Here are a few recommended readings before getting started with this lesson.
Statistics is a set of tools and techniques used to collect, organize, and interpret information. These pieces of information are also known as data, which statisticians analyze. One of the main ways to collect data consist of asking questions. There are especially useful questions that are known as statistical questions.
A question is considered a statistical question when it is designed to anticipate and account for a variety of answers. Consider the differences between the following statistical and non-statistical questions.
Non-statistical Question | Statistical Question |
---|---|
What is Davontay's height? | What are the heights of the students in Davontay's class? |
What is the total population of the United States? | In the United States, what is each individual state's population? |
What was Paulina's last Pre-Algebra test score? | What are Paulina's test scores from previous month? |
Data is gathered with the goal of interpreting it into meaningful information. Asking a set of questions to gather this data — usually statistical questions — leads to a clearer picture when interpreting the data. Such sets of questions are called surveys.
A statistical study that collects data from participants through a questionnaire — a series of questions — is called a survey.The answers obtained in a survey aim to collect data about the characteristics, behaviors, or opinions of the members of the population.
Questions must be clear to avoid confusion in the participant. The participants answer the questions while the researcher is sure to never interfere. Here are a few common types of surveys.
Statistics are used to analyze collected data to help understand how different things work or how different people think. The goal of statistics is to understand the characteristics of a population.
In statistics, a population consists of all members of a group of interest. Populations can vary in size and include people, animals, plants, or objects. Since studying every member of a population is impractical, a representative subset called a sample is used instead. The sample is used to represent or make assumptions about the population.
For example, suppose there is a study that examines high school students in Boston and their attitudes towards mathematics. In this case, the population consists of all high school students in Boston. Since it is impractical to study every student, a sample can be taken by using sampling methods.Dylan is hyped to learn more about statistics but he is having some issues. He cannot determine which set represents the population and which represents a sample of the population. Dylan goes to his teacher for help.
She gives Dylan a few example problems that will help him learn about sample versus population.
Dylan's loves playing basketball. A reason for Dylan's interest in statistics is that he wants to understand the stats of his favorite players over the course of a season. He decides to try out some statistics for practice and notes the scores of his favorite player, Savant Saucey, in games selected at random during the season.
Scores | |
---|---|
45 | 27 |
40 | 33 |
22 | 46 |
29 | 9 |
18 | 20 |
34 | 15 |
Scores | |
---|---|
45 | 27 |
40 | 33 |
22 | 46 |
29 | 9 |
18 | 20 |
34 | 15 |
ba=b⋅5a⋅5
LHS⋅60=RHS⋅60
Rearrange equation
Most shots in basketball are worth two points. However, there are other shots with different values. For example, a free throw is worth one point. These are uncontested shot attempts that players are awarded if they are fouled while shooting.
A key point in selecting a sample is to select a sample that most reflects the population. These samples are called representative samples and help produce meaningful interpretations.
A sample that accurately reflects the characteristics of the population is called a representative sample.
In a representative sample, if x parts of the population have a certain characteristic, approximately x parts of the sample will share the same characteristic. As an example, consider a sample of cats and dogs selected from all the cats and dogs at a pet shop.
The figure shows cats represent 60% of the population and 60% of the sample. Similarly, dogs represent 40% of the population and 40% of the sample. In this case, the sample is then representative. Note that in a representative sample, every subgroup of the population is represented.
It is possible that samples reflect biases. In most cases, unbiased samples result in representative samples. On the other hand, biased samples usually result in samples that are not representative of the population.
A sample is said to be unbiased if every member of the population has the same probability of being selected. The most common of these types are called Simple Random Samples.
A simple random sample is an unbiased sample because it involves selecting members from a population randomly. This guarantees that each member of the population has an equal chance of being included in the sample. This type of sample is usually representative because it tends to have the same characteristics as the population.
Another unbiased sample type is a systematic sample.
Systematic samples can be either biased or unbiased.
It is important to remember that unbiased samples are better suited to generate a representative sample.
In biased samples, one part of the population is preferred over the others in some way. These type of samples are usually not representative samples, but are usually more practical than unbiased samples. For example, consider convenience samples.
This type of sampling is often used when researchers have limited resources or are under certain time constraints.
Convenience sampling can lead to a biased sample since the researchers choose sample members that are easily available to them. Some groups of people in the population may not be represented in the sample because they are not easily accessible, causing the sample to not be representative of the population.
A self-selected sample is a biased sample because people with strong opinions, either positive or negative, about the topic studied are more likely to volunteer. Also, people who are interested in the topic being studied may be more likely to participate, while those who are not interested may refuse to participate.
As a result, such a sample is not representative of the population because it underrepresents people with neutral opinions about the topic or who are not interested in it.
As previously mentioned in the lesson, one of the main goal of statistics is to make inferences about the characteristics of a population using samples. It is important to have in mind that not every inference is correct. However, valid inferences are very likely to be correct.
A valid inference is a conclusion drawn from a sampling of a population that is highly likely to be true. This type of inference is based on representative samples and supported by sufficient data. Consider an example where a survey was conducted on the people of a city asking for their heights.
Here are more details of the survey and what it found.Dylan is doing great in statistics class. Miss Jackson assigns him to take another survey of his choice. Wondering if the students in his school like basketball as much as him, he decides to ask in a survey. He is nervous his classmates will say things like Basketball?
No way!
and Eww.
Dylan arrives to school early and asks every fifth student entering the school what their interest level is, if any, in basketball. By the end of the morning, Dylan surveyed 54 students. The results showed that almost 30% of the students love basketball!
Sample | Characteristics |
---|---|
Simple random sample | The participants are selected from the population using an identifier. Every member has the same chance of being selected. |
Systematic sample | The members of the population are ordered randomly. Then, each participant is selected in a specified interval. |
Convenience sample | Each participant is selected based on availability or convenience to the researcher. |
Voluntary response sample | Each participant voluntarily chooses to be a part of the sample. |
The order in which the students arrive to school is close to random. In addition to this, Dylan selected the participants in an interval of five. Therefore, it can be determined that Dylan chose a systematic sample.
Dylan is going to a game of his favorite basketball team, the Fighters.
There are three different teams in the state where Dylan lives. Wondering which team is the most popular in the state, Dylan conducted a survey on the day of the game. He asked several people in the Fighter's arena which was their favorite team from the state. Dylan's results can be represented with a pie chart.
Sample | Characteristics |
---|---|
Simple random sample | The participants are selected from the population using an identifier. Every member has the same chance of being selected. |
Systematic sample | The members of the population are ordered randomly. Then, each participant is selected in a specified interval. |
Convenience sample | Each participant is selected based on availability or convenience to the researcher. |
Voluntary response sample | Each participant voluntarily chooses to be a part of the sample. |
Dylan selected the participants of the sample in the same location as him. Therefore, it can be determined that Dylan chose a convenience sample.
Dylan's uncle sells Fighters' jerseys at an apparel store. He knows that many people buy jerseys during the playoffs. He wants to know which he should order based on popularity. He randomly selected a simple random sample of 84 customers and conducted a survey asking who their favorite player is. The results are in the table.
Player | Votes from Customers |
---|---|
Savant Saucey | 42 |
Elgin Baylor | 25 |
Nate Thurmond | 17 |
Dylan's uncle wants to order 420 new jerseys to sell during the playoffs. He already made a great table and now he asks Dylan to run some analysis that will help him make the correct order of jerseys.
Substitute values
ba=b/42a/42
Calculate quotient
Refer to the challenge given at the beginning of the lesson. Apps often ask customers to give five star reviews. The customers are the population that the app owners want to make inferences about. The method that the app owners use to select a sample can be determined by comparing it the following sample types.
Sample | Characteristics |
---|---|
Simple random sample | The participants are selected from the population using an identifier. Every member has the same chance of being selected. |
Systematic sample | The members of the population are ordered randomly. Then, each participant is selected in a specified interval. |
Convenience sample | Each participant is selected based on availability or convenience to the researcher. |
Voluntary response sample | Each participant voluntarily chooses to be a part of the sample. |