Sign In
| 12 Theory slides |
| 9 Exercises - Grade E - A |
| Each lesson is meant to take 1-2 classroom sessions |
Here are a few recommended readings before getting started with this lesson.
Bivariate data is a set of data that has been collected in two variables. It shows a relationship between the variables. Each data value in one variable corresponds to a data value in the other variable. A data set with the shoe sizes and heights of people is an example of bivariate data.
Shoe Size | Height (in.) |
---|---|
8 | 63 |
7 | 59 |
6 | 60 |
5 | 55 |
7 | 58 |
8 | 60 |
6 | 56 |
7 | 64 |
5 | 54 |
A scatter plot is used to represent bivariate data.
A scatter plot is a graph that shows each observation of a bivariate data set as an ordered pair in a coordinate plane. Consider the following example, where a scatter plot illustrates the results gathered at a local ice cream parlor. This study records the number of ice creams sold and the corresponding air temperature.
The table outlines the similarities and differences between association and correlation.
Similarities | Differences |
---|---|
|
|
The following applet shows various scatter plots. Choose the type of association that best describes the relationship depicted in each scatter plot.
In a small town Mathville, the local ice cream shop plans to introduce a new flavor.
Zosia is eager to explore factors influencing sales before the big reveal. She begins by analyzing the relationship between ice cream sales and town temperature. The provided bivariate data is the data of the first investigation.
Temperature | Number of Ice Cream Sales |
---|---|
15∘C | 20 |
18∘C | 18 |
21∘C | 42 |
24∘C | 55 |
27∘C | 70 |
30∘C | 65 |
33∘C | 120 |
36∘C | 116 |
39∘C | 130 |
42∘C | 140 |
Cluster: None
Gap: Falls between 70−116 number of sales.
Temperature | Number of Ice Cream Sales | Ordered Pairs |
---|---|---|
15∘C | 20 | (15,20) |
18∘C | 18 | (18,18) |
21∘C | 42 | (21,42) |
24∘C | 55 | (24,55) |
27∘C | 70 | (27,70) |
30∘C | 65 | (30,65) |
33∘C | 120 | (33,120) |
36∘C | 116 | (36,116) |
39∘C | 130 | (39,130) |
42∘C | 140 | (42,140) |
Now plot the points on a coordinate plane to construct the scatter plot.
From the scatter plot previously drawn, it can be seen that as the temperature increases, the number of ice creams sold also increases almost constantly.
Cluster | A group of data points that are close together on a graph. |
---|---|
Gap | An empty space or interval between groups of data points on a graph. |
Outlier | A data point that is noticeably in different place from the other data points on a graph. |
With this information in mind, take a look at the scatter plot.
Draw some conclusions about the clusters, gaps, and outliers observed in the scatter plot.
When data sets have a positive or negative correlation, the trend of the data can be modeled using a line of fit, also called a trend line. This line is drawn on a scatter plot near most of the data points, which appear evenly distributed above and below the line.
The scatter plot above shows the mean weights of kittens from the same litter in relation to their age. In this case, a line of fit could be drawn quite seamlessly. When drawing such a line of fit, the following characteristics should be considered.
Given a scatter plot and a line, determine whether the line is a trend line.
Zosia believes it might be best to launch the new flavor when the town's temperature is on the rise. Her next investigation focuses on the connection between ice cream sales and the time of day. She records the number of ice cream sales during the time of day.
Time | Number of Ice Cream Sold |
---|---|
0 | 2 |
8 | 4 |
10 | 7 |
12 | 13 |
14 | 15 |
16 | 18 |
18 | 22 |
20 | 20 |
22 | 23 |
Notice that by changing what the x- and y-axes represent, a different scatter plot can be created.
Note that different observers may draw different lines of fit, as this depends on their observations of the data points.
Substitute values
Subtract terms
ba=b/2a/2
x=10, y=8
ca⋅b=ca⋅b
Multiply
Calculate quotient
LHS−14=RHS−14
Rearrange equation
There is a positive linear association between the time of the day and the number of ice creams sold. Additionally, when the time of the day is 4, it is expected that there will be no ice cream sales.
She analyzes the relationship between ice cream sales and the ages of buyers within a week. The provided bivariate data is the data of this investigation.
Ages of Buyers | Number of Ice Cream Sales |
---|---|
5 | 10 |
10 | 6 |
15 | 16 |
20 | 18 |
25 | 10 |
30 | 20 |
35 | 25 |
40 | 6 |
45 | 9 |
50 | 3 |
55 | 13 |
60 | 8 |
Ages of Buyers | Number of Ice Cream Sales | Ordered Pair |
---|---|---|
5 | 10 | (5,10) |
10 | 6 | (10,6) |
15 | 16 | (15,16) |
20 | 18 | (20,18) |
25 | 10 | (25,10) |
30 | 20 | (30,20) |
35 | 25 | (35,25) |
40 | 6 | (40,6) |
45 | 9 | (45,9) |
50 | 3 | (50,3) |
55 | 13 | (55,13) |
60 | 8 | (60,8) |
Now plot the points on a coordinate plane to construct the scatter plot.
BestFit
Zosia ultimately concluded that the new ice cream flavor could be introduced during warmer weather and daylight hours, based on her investigation. However, it is crucial to recognize that this decision is based on observations. The ice cream seller may encounter different outcomes when introducing the new flavor due to external factors.
Remember that the lines of fit are drawn during the lesson to help in interpreting the scatter plots by assessing the closeness of all data points to the line. This suggests that different lines of fit can also be drawn for each example. However, there is only one line of best fit for the association.
A line of best fit, also known as a regression line, is a line of fit that estimates the relationship between the values of a data set. The equation of the line of best fit has been determined using a strict mathematical method.
Construct a scatter plot of the number of books donated over time to the public library.
Year | Number of Books |
---|---|
1 | 32 |
2 | 27 |
3 | 35 |
4 | 42 |
5 | 58 |
6 | 67 |
7 | 63 |
8 | 71 |
Remember that a scatter plot is a graph that shows the associations between two data sets. The given table shows the number of books donated over 8 years. First, we need to represent the data as ordered pairs (x,y), where x is the year and y is the number of books.
Year | Number of Books | Ordered Pair |
---|---|---|
1 | 32 | (1,32) |
2 | 27 | (2,27) |
3 | 35 | (3,35) |
4 | 42 | (4,42) |
5 | 58 | (5,58) |
6 | 67 | (6,67) |
7 | 63 | (7,63) |
8 | 71 | (8,71) |
Now, we can graph the ordered pairs on a coordinate plane to create the scatter plot of the number of books donated over time.
The scatter plot shows the association between the price of the games in dollars and the number of games sold.
We want to identify the association between two data sets given in a scatter plot. First, let's remember the types of associations.
Positive Linear Association | As the value of one quantity increases, the value of the other quantity also increases at a constant rate. |
---|---|
Negative Linear Association | As the value of one quantity increases, the value of the other quantity decreases at a constant rate. |
No Association | Changes in one quantity do not affect the increase or decrease in the other quantity. |
Non-linear Association | The relationship between two quantities doesn't follow a straight line but instead forms a curve or a pattern that's not straight. |
Let's take a look at the given scatter plot that shows how the number of games sold changes as the price of the game increases.
Notice that as the price of the game increases, the number of games sold decreases consistently, indicating a negative linear association between price and sales.
The data set consists of ordered pairs representing the prices of books and the corresponding number of books sold in a bookstore.
(Price, Books Sold) | |
---|---|
(6,11) | (6.75,39) |
(8.25,15) | (8.75,10) |
(9.95,18) | (16,10) |
(17.50,9) | (18.25,11) |
(18,18) | (20,10) |
(20.50,16) | (21.25,15) |
(22.25,14) | (22.50,19) |
(24,20) | (24.25,44) |
(24.50,14) | (25,14) |
The scatter plot represents the prices and the number of books sold in a bookstore.
Answer the following multiple choice exercises.
Let's remember the definition of the outlier.
Outlier |- Data point that lies far away from other points.
First look at the the scatter plot of the prices of the books and the number of books sold in a bookstore to identify if any outliers exist.
We can see that there are two data points that are set off from the others. These points represent two books: one priced at $6.75 with 39 sales, and another priced at $24.25 with 44 sales. These are the outliers of the data set. Outliers (6.75,39) (24.25,44)
Now remember the definition of a gap.
Gap |- Area of a graph that does not contain any data.
With this information, we notice that there is no any data between the price range of $10 and $15.
This means that there is a gap between the price range of $10 and $15. Gap Between $10and$15
Take a look at the definition of the cluster.
Cluster |- Group of points that lie close together.
We observe that the majority of data points represent books priced between $15 and $25, with the number of sales ranging from 9 to 20.
This grouping of points can be described as a cluster. Cluster Between $15and$25
A scatter plot is given.
Which of the given lines is the line of fit that best captures the trend observed in the given data set?
We need to pick the line on the scatter plot that best fits the data set. Remember, the line of fit should go near most of the data points. Additionally, there should be about the same number of points above and below the line. Let's take a closer look at the line in option A.
All the data points are close to the provided line, and there are an equal number of points above and below it. This suggests that this option could be a line of fit for the dataset. The correct option is A. However, remember, there can be more than one line of fit, so it's possible to draw another line that fits the data.