{{ toc.signature }}
{{ 'ml-toc-proceed-mlc' | message }}
{{ 'ml-toc-proceed-tbs' | message }}
An error ocurred, try again later!
Chapter {{ article.chapter.number }}
{{ article.number }}.

# {{ article.displayTitle }}

{{ article.intro.summary }}
{{ ability.description }}
Lesson Settings & Tools
 {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} {{ 'ml-lesson-time-estimation' | message }}
When exploring patterns, predicting outcomes, and understanding relationships within data sets, visual representation is crucial. In this lesson, it is delved into one of the essential visual representation tools: scatter plots and trend lines. These tools reveal insights into the relationships between data points and uncover underlying trends and patterns.

### Catch-Up and Review

Here are a few recommended readings before getting started with this lesson.

Explore

## Relationship Between a Line and Data Points

The following applet displays various graphs, each representing data as a set of points distributed on a coordinate plane, accompanied by a movable line.
Adjust the line by changing its slope and position to closely match the majority of the points. How can one assess whether this line effectively models the data?
Discussion

## Bivariate Data

Bivariate data is a set of data that has been collected in two variables. It shows a relationship between the variables. Each data value in one variable corresponds to a data value in the other variable. A data set with the shoe sizes and heights of people is an example of bivariate data.

Shoe Size Height (in.)

A scatter plot is used to represent bivariate data.

This scatter plot shows that tall people tend to have larger shoe sizes. Keep in mind that different bivariate data sets may have different types of associations.
Discussion

## Scatter Plot

A scatter plot is a graph that shows each observation of a bivariate data set as an ordered pair in a coordinate plane. Consider the following example, where a scatter plot illustrates the results gathered at a local ice cream parlor. This study records the number of ice creams sold and the corresponding air temperature.

Among other insights, the graph shows that when the temperature is about approximately ice creams are sold. Additionally, as the temperature increased, the number of sales also increased. In this case, it can be said that there is a positive correlation between the data sets — the number of ice creams sold and the air temperature.
Discussion

## Association

The association of bivariate data means the relationship between two variables in a data set. Analyzing bivariate data involves considering how changes in one variable relate to changes in another. The relationship of can declare as positive linear, negative linear, non-linear association, or exhibit no association.
Identifying the association in bivariate data leads to recognizing patterns, making predictions, and drawing conclusions about the relationship between the two data sets.

### Extra

Similarities and Differences Between Association and Correlation

The table outlines the similarities and differences between association and correlation.

Similarities Differences
• Both describe the relationship between two random variables.
• Both use scatter plots for analyzing relationships between variables.
• Correlation detects linear relationships; association detects both linear and non-linear relationships.
• Correlation quantifies a relationship with a number between and association does not quantify.
Pop Quiz

## Identifying Associations from Scatter Plots

The following applet shows various scatter plots. Choose the type of association that best describes the relationship depicted in each scatter plot.

Example

## Exploring Ice Cream Sales and Town Temperature

In a small town Mathville, the local ice cream shop plans to introduce a new flavor.

External credits: macrovector_official

Zosia is eager to explore factors influencing sales before the big reveal. She begins by analyzing the relationship between ice cream sales and town temperature. The provided bivariate data is the data of the first investigation.

Temperature Number of Ice Cream Sales
a Construct a scatter plot for the given bivariate data set.
b What type of association does the data have? Justify the answer.
c Identify clusters, gaps, and outliers, if any.

b Positive linear association, see solution.

Cluster: None
Gap: Falls between number of sales.

Outlier: None

### Hint

a Let the axis show the town temperature and the axis show the number of new ice cream flavor sold.
b How does change as increases?
c Remember the meanings of the cluster, gap, and outlier.

### Solution

a Remember a scatter plot represents each observation of a bivariate data set as an ordered pair on a coordinate plane. In this case, let represent the town temperature and represent the number of new ice cream flavors sold. Begin by expressing the bivariate data as ordered pairs.
Temperature Number of Ice Cream Sales Ordered Pairs

Now plot the points on a coordinate plane to construct the scatter plot.

b Remember the types of association between data sets.

From the scatter plot previously drawn, it can be seen that as the temperature increases, the number of ice creams sold also increases almost constantly.

This means that the bivariate data has a positive linear association.
c Before identifying any clusters, gaps, or outliers, these terms can be defined.
 Cluster A group of data points that are close together on a graph. An empty space or interval between groups of data points on a graph. A data point that is noticeably in different place from the other data points on a graph.

With this information in mind, take a look at the scatter plot.

Draw some conclusions about the clusters, gaps, and outliers observed in the scatter plot.

• There are no clusters.
• There is a gap between and
• There are no outliers in the data.
Note that these conclusions are based on observations from the scatter plot. Additionally, interpretations may vary among different observers. This implies that the solution is an illustrative example.
Discussion

## Line of Fit

When data sets have a positive or negative correlation, the trend of the data can be modeled using a line of fit, also called a trend line. This line is drawn on a scatter plot near most of the data points, which appear evenly distributed above and below the line.

The scatter plot above shows the mean weights of kittens from the same litter in relation to their age. In this case, a line of fit could be drawn quite seamlessly. When drawing such a line of fit, the following characteristics should be considered.

• The data needs to have either a positive or negative correlation.
• While a line of fit is not unique and does not create an exact distribution, ideally, about half of the points should be above the line and about half below the line.
• An equation of the line can be found using two of its points. These points do not necessarily belong to the bivariate data set.
Ultimately, a line of fit can be used to make predictions and generalize the trends of data sets. Additionally, when a line of fit is determined using strict mathematical methods, it is commonly referred to as a line of best fit.
Pop Quiz

## Determine the Trend Line

Given a scatter plot and a line, determine whether the line is a trend line.

Example

## Impact of Time of the Day on Ice Cream Sales

Zosia believes it might be best to launch the new flavor when the town's temperature is on the rise. Her next investigation focuses on the connection between ice cream sales and the time of day. She records the number of ice cream sales during the time of day.

Time Number of Ice Cream Sold
a Construct a scatter plot of the bivariate data.
b Draw a line of fit for the points.
c Write the equation of the line of fit in slope-intercept form.
d Use the line of fit to make predictions.

c
d The type of association between the data sets is positive.

### Hint

a Let the axis show the time of the day and the axis show the number of ice creams sold.
b Points should be evenly distributed above and below the line of fit.
d Use two points on the line.
d How do the values change as the values increases?

### Solution

a To draw the scatter plot, let be the time of the day and be the number of ice creams sold. With this in mind, the information from the table can be shown on a scatter plot.

Notice that by changing what the and axes represent, a different scatter plot can be created.

b To draw the line of fit, data points need to be evenly distributed above and below the line of fit. Additionally, the line of fit should be near to most of the points. With this information, draw an example line of fit.

Note that different observers may draw different lines of fit, as this depends on their observations of the data points.

c Remember the slope-intercept form of an equation of a line.
Select two points that are on the line of fit.
The points on the line of fit are and Now, calculate the slope by dividing the difference in the by the difference in the
The slope of the line of fit is Substitute the value of the slope to the formula for the equation in slope-intercept form.
Use the coordinates of and substitute them into the formula. Then solve for
Substitute for into the formula to create the equation of the line of fit.
d Notice that according to the line of fit, as the time of day increases, the number of ice cream sales also increases almost constantly.

There is a positive linear association between the time of the day and the number of ice creams sold. Additionally, when the time of the day is it is expected that there will be no ice cream sales.

Example

## Impact of Buyer Age on Ice Cream Sales

She analyzes the relationship between ice cream sales and the ages of buyers within a week. The provided bivariate data is the data of this investigation.

Ages of Buyers Number of Ice Cream Sales
a Construct a scatter plot for the given bivariate data set.
b What type of association does the data have? Justify the answer.

a
b No association, see solution.

### Hint

a Let the axis show the age of buyers and the axis show the number of ice creams sold.
b How does change as increases?

### Solution

a Let represent the age of buyers and represent the number of ice creams sold. First, express the bivariate data as ordered pairs.
Ages of Buyers Number of Ice Cream Sales Ordered Pair

Now plot the points on a coordinate plane to construct the scatter plot.

b The types of association between data sets are as follows.
After examining the previously drawn scatter plot, it is observed that as the age of the buyer increases, the number of ice creams sold sometimes increases and sometimes decreases.
This indicates that there is no association between the age of the buyer and the number of ice creams sold.
Closure

## Line of Best Fit

Zosia ultimately concluded that the new ice cream flavor could be introduced during warmer weather and daylight hours, based on her investigation. However, it is crucial to recognize that this decision is based on observations. The ice cream seller may encounter different outcomes when introducing the new flavor due to external factors.

External credits: macrovector_official

Remember that the lines of fit are drawn during the lesson to help in interpreting the scatter plots by assessing the closeness of all data points to the line. This suggests that different lines of fit can also be drawn for each example. However, there is only one line of best fit for the association.

Concept

## Line of Best Fit

A line of best fit, also known as a regression line, is a line of fit that estimates the relationship between the values of a data set. The equation of the line of best fit has been determined using a strict mathematical method.

One commonly used method to determine a line of best fit is the method of least squares. The methods used to find the line of best fit are usually hard to do by hand. Therefore, a line of best fit can be found by performing a linear regression on a graphing calculator.