| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |
| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |
| {{ 'ml-lesson-time-estimation' | message }} |
Here are a few recommended readings before getting started with this lesson.
Bivariate data is a set of data that has been collected in two variables. It shows a relationship between the variables. Each data value in one variable corresponds to a data value in the other variable. A data set with the shoe sizes and heights of people is an example of bivariate data.
Shoe Size | Height (in.) |
---|---|
8 | 63 |
7 | 59 |
6 | 60 |
5 | 55 |
7 | 58 |
8 | 60 |
6 | 56 |
7 | 64 |
5 | 54 |
A scatter plot is used to represent bivariate data.
This scatter plot shows that tall people tend to have larger shoe sizes. Keep in mind that different bivariate data sets may have different types of associations.A scatter plot is a graph that shows each observation of a bivariate data set as an ordered pair in a coordinate plane. Consider the following example, where a scatter plot illustrates the results gathered at a local ice cream parlor. This study records the number of ice creams sold and the corresponding air temperature.
Among other insights, the graph shows that when the temperature is about 100∘F, approximately 4000 ice creams are sold. Additionally, as the temperature increased, the number of sales also increased. In this case, it can be said that there is a positive correlation between the data sets — the number of ice creams sold and the air temperature.The table outlines the similarities and differences between association and correlation.
Similarities | Differences |
---|---|
|
|
The following applet shows various scatter plots. Choose the type of association that best describes the relationship depicted in each scatter plot.
In a small town Mathville, the local ice cream shop plans to introduce a new flavor.
Zosia is eager to explore factors influencing sales before the big reveal. She begins by analyzing the relationship between ice cream sales and town temperature. The provided bivariate data is the data of the first investigation.
Temperature | Number of Ice Cream Sales |
---|---|
15∘C | 20 |
18∘C | 18 |
21∘C | 42 |
24∘C | 55 |
27∘C | 70 |
30∘C | 65 |
33∘C | 120 |
36∘C | 116 |
39∘C | 130 |
42∘C | 140 |
Cluster: None
Gap: Falls between 70−116 number of sales.
Temperature | Number of Ice Cream Sales | Ordered Pairs |
---|---|---|
15∘C | 20 | (15,20) |
18∘C | 18 | (18,18) |
21∘C | 42 | (21,42) |
24∘C | 55 | (24,55) |
27∘C | 70 | (27,70) |
30∘C | 65 | (30,65) |
33∘C | 120 | (33,120) |
36∘C | 116 | (36,116) |
39∘C | 130 | (39,130) |
42∘C | 140 | (42,140) |
Now plot the points on a coordinate plane to construct the scatter plot.
From the scatter plot previously drawn, it can be seen that as the temperature increases, the number of ice creams sold also increases almost constantly.
This means that the bivariate data has a positive linear association.Cluster | A group of data points that are close together on a graph. |
---|---|
Gap | An empty space or interval between groups of data points on a graph. |
Outlier | A data point that is noticeably in different place from the other data points on a graph. |
With this information in mind, take a look at the scatter plot.
Draw some conclusions about the clusters, gaps, and outliers observed in the scatter plot.
When data sets have a positive or negative correlation, the trend of the data can be modeled using a line of fit, also called a trend line. This line is drawn on a scatter plot near most of the data points, which appear evenly distributed above and below the line.
The scatter plot above shows the mean weights of kittens from the same litter in relation to their age. In this case, a line of fit could be drawn quite seamlessly. When drawing such a line of fit, the following characteristics should be considered.
Given a scatter plot and a line, determine whether the line is a trend line.
Zosia believes it might be best to launch the new flavor when the town's temperature is on the rise. Her next investigation focuses on the connection between ice cream sales and the time of day. She records the number of ice cream sales during the time of day.
Time | Number of Ice Cream Sold |
---|---|
0 | 2 |
8 | 4 |
10 | 7 |
12 | 13 |
14 | 15 |
16 | 18 |
18 | 22 |
20 | 20 |
22 | 23 |
Notice that by changing what the x- and y-axes represent, a different scatter plot can be created.
Note that different observers may draw different lines of fit, as this depends on their observations of the data points.
Substitute values
Subtract terms
ba=b/2a/2
x=10, y=8
ca⋅b=ca⋅b
Multiply
Calculate quotient
LHS−14=RHS−14
Rearrange equation
There is a positive linear association between the time of the day and the number of ice creams sold. Additionally, when the time of the day is 4, it is expected that there will be no ice cream sales.
She analyzes the relationship between ice cream sales and the ages of buyers within a week. The provided bivariate data is the data of this investigation.
Ages of Buyers | Number of Ice Cream Sales |
---|---|
5 | 10 |
10 | 6 |
15 | 16 |
20 | 18 |
25 | 10 |
30 | 20 |
35 | 25 |
40 | 6 |
45 | 9 |
50 | 3 |
55 | 13 |
60 | 8 |
Ages of Buyers | Number of Ice Cream Sales | Ordered Pair |
---|---|---|
5 | 10 | (5,10) |
10 | 6 | (10,6) |
15 | 16 | (15,16) |
20 | 18 | (20,18) |
25 | 10 | (25,10) |
30 | 20 | (30,20) |
35 | 25 | (35,25) |
40 | 6 | (40,6) |
45 | 9 | (45,9) |
50 | 3 | (50,3) |
55 | 13 | (55,13) |
60 | 8 | (60,8) |
Now plot the points on a coordinate plane to construct the scatter plot.
BestFit
Zosia ultimately concluded that the new ice cream flavor could be introduced during warmer weather and daylight hours, based on her investigation. However, it is crucial to recognize that this decision is based on observations. The ice cream seller may encounter different outcomes when introducing the new flavor due to external factors.
Remember that the lines of fit are drawn during the lesson to help in interpreting the scatter plots by assessing the closeness of all data points to the line. This suggests that different lines of fit can also be drawn for each example. However, there is only one line of best fit for the association.
A line of best fit, also known as a regression line, is a line of fit that estimates the relationship between the values of a data set. The equation of the line of best fit has been determined using a strict mathematical method.
One commonly used method to determine a line of best fit is the method of least squares. The methods used to find the line of best fit are usually hard to do by hand. Therefore, a line of best fit can be found by performing a linear regression on a graphing calculator.