{{ stepNode.name }}

Proceed to next lesson

An error ocurred, try again later!

Chapter {{ article.chapter.number }}

{{ article.number }}. # {{ article.displayTitle }}

{{ article.introSlideInfo.summary }}

{{ 'ml-btn-show-less' | message }} {{ 'ml-btn-show-more' | message }} {{ 'ml-lesson-show-solutions' | message }}

{{ 'ml-lesson-show-hints' | message }}

| {{ 'ml-lesson-number-slides' | message : article.introSlideInfo.bblockCount}} |

| {{ 'ml-lesson-number-exercises' | message : article.introSlideInfo.exerciseCount}} |

| {{ 'ml-lesson-time-estimation' | message }} |

Image Credits *expand_more*

- {{ item.file.title }} {{ presentation }}

No file copyrights entries found

This lesson aims to show how to find the linear function that better models a scatter plot or data set. Additionally, it will be shown how to use these linear functions to make predictions.
### Catch-Up and Review

**Here are a few recommended readings before getting started with this lesson.**

Given a scatter plot, a line of fit can be used to make good predictions of values that are not known. Since there are many possible lines of fit, finding the one that most accurately represents the given data points is an important goal. The goal is finding the line of fit in which the different sums of the residuals is as close to $0$ as possible.

Examine how the different sums of the residuals change when moving the line. The most commonly used residual is the sum of the squared differences.

A line of best fit, also known as a regression line, is a line of fit whose equation has been determined using a strict mathematical method that estimates the relationship between the values of a data set.

One commonly used method to determine a line of best fit is the *method of least squares*. It should be noted that the methods used to find the line of best fit are usually hard to do by hand. Therefore, a line of best fit can be found by performing a linear regression on a graphing calculator. As an example, consider the data set graphed above.

$x$ | $0.6$ | $1.2$ | $2.6$ | $3.6$ | $4.5$ | $6$ | $6.6$ | $7.1$ |
---|---|---|---|---|---|---|---|---|

$y$ | $1.5$ | $3.6$ | $5.2$ | $6.3$ | $8.7$ | $10.3$ | $11.8$ | $11.7$ |

In reference to the graph, the data points seemingly can nearly be generated by the line $y=1.55x+1.14.$ Consequently, even if the data points do not belong to any particular line, a linear model can be said to describe the data well enough. On the contrary, consider the following data set.

$x$ | $0.6$ | $1.2$ | $2.6$ | $3.6$ | $4.5$ | $6$ | $6.6$ | $7.1$ |
---|---|---|---|---|---|---|---|---|

$y$ | $1.5$ | $8.1$ | $9.5$ | $12$ | $7.1$ | $2.5$ | $11.6$ | $1.5$ |

Look at the data points graphed onto a coordinate plane.

Looking at the graph, it can be seen that the points are not close to any line. Therefore, the data set is not well described by a linear model. Any line of fit used to estimate a relationship between the values will not be useful.Most graphing calculators have a function called linear regression, which can be used to find a precise line of fit using strict rules. This line of fit is then called the line of best fit. For example, consider the following data set.

$x$ | $y$ |
---|---|

$1$ | $33.12$ |

$2$ | $24.4$ |

$3$ | $16.6$ |

$4$ | $9.3$ |

$5$ | $3.9$ |

The line of best fit can be calculated following these $3$ steps.

1

Enter Values

On a graphing calculator, begin by entering the data points. To do so, press the $STAT $ button and select the option Edit.

This gives a number of columns, labeled $L_{1},$ $L_{2},$ $L_{3},$ and so on.

Use the arrow keys to choose where in the lists to fill in the data values. Enter the $x-$values of the data points in $L_{1}$ and press $ENTER $ after each value. The same can be done for the the corresponding $y-$values in column $L_{2}.$

2

Fitting the Line

After entering the values, press the $STAT $ button and select the menu item Calc.

The option LinReg(ax+b)

gives the line of best fit, expressed as a linear function in slope-intercept form. Press the $ENTER $ button until the parameters are given.

In this case, the line of best fit is described by the function $y=-7.354x+39.506.$ The correlation coefficient $r$ is less than $-0.99,$ indicating a strong negative correlation. If $r$ does not appear, press $2ND $ and $0$ to get to the CATALOG, then select DiagnosticOn

and enable it by pressing $ENTER $ twice. Once more, find the line of best fit.

3

Graphing the Line of Best Fit

A graphing calculator can also be used to graph the line of best fit. After selecting the option LinReg(ax+b),

choose the option Store RegEQ.

Press $VARS $ and move to the **Y-VARS** menu. Then, select the option FUNCTION

and press $ENTER .$

Press the $ENTER $ button until the parameters are given. To graph the scatter plot, first push the buttons $2nd $ and $Y= .$ Then, choose one of the plots in the list. Select the option ON,

choose the type to be a scatter plot, and assign $L_{1}$ and $L_{2}$ as XList

and Ylist,

respectively.

Then, the plot can be made by pressing the button $GRAPH .$ It is possible that after drawing the plot the window-size is not large enough to see all of the information.

To fix this, press $ZOOM $ and select the option ZoomStat.

After doing that, the window will resize to show the important information.

For a school project, Ramsha wants to investigate if there is a correlation between the width of a tree and its height. To do so, she measured the diameter at chest height and the height of some trees in a local park. Her findings are shown in the following table.

Diameter at chest (cm) | Height (m) |
---|---|

$8$ | $7$ |

$10$ | $10$ |

$15$ | $14$ |

$18$ | $15$ |

$20$ | $18$ |

$22$ | $21$ |

$25$ | $15$ |

$30$ | $20$ |

a What is the equation of the line of best fit using linear regression? Round the values in the equation to two decimal places.

b What is the correlation coefficient? Round the value to two decimal places. Are the data correlated?

c Graph the data points and the line of best fit.

d Write an interpretation of the $y-$intercept and the slope.

a $y=0.55x+4.74$

b **Correlation Coefficient:** $r≈0.86$

**Are the Data Correlated?** Yes, see solution.

c **Graph:**

d See solution.

a Use the linear regression feature on a graphing calculator.

b The correlation coefficient is $r$ in the linear regression output on a graphing calculator.

c Use the graphing features on a graphing calculator.

d What do these measures indicate in a line?

a The line of best fit can be found using a graphing calculator. First, the data values need to be introduced into the calculator. This is done by pressing the $STAT $ button and then selecting the option

Edit.

Then the data values are written in the columns.

By pressing the $STAT $ button and then selecting the **CALC** menu, the option LinReg($ax+b$)

can be found. This option gives the line of best fit, expressed as a linear function in slope-intercept form.

$y=0.55x+4.74 $

b The correlation coefficient can be found on the linear regression results screen from Part A.

The correlation coefficient is the value of $r$ on the screen.

$r≈0.86 $

The value of $r$ varies from $-1$ to $1.$ A value close to $-1$ indicates a negative correlation, while a value close to $1$ indicates a positive correlation. Since the correlation coefficient is close to $1,$ the data has a strong positive correlation.
c To graph the line of best fit, first press $Y= $ and write the equation of the line of best fit.

Then, to graph the scatter plot push the buttons $2nd $ and $Y= .$ Choose one of the plots in the list. Select the option ON,

choose the type to be a scatter plot, and assign $L_{1}$ and $L_{2}$ as XList

and Ylist,

respectively.

The plot can be made by pressing the button $GRAPH .$ It is possible that after drawing the plot the window-size is not adequate for seeing all the information.

To fix this press $ZOOM $ and select the option ZoomStat.

After doing so the window will resize to show the important information.

d In Part A the equation for the line of best fit was found.

$y=0.55x+4.74 $

In this equation, the slope is $0.55$ and the $y-$intercept is $4.74.$ - The slope indicates that for every centimeter that the tree grows in diameter, about $0.55$ meters are gained in height.
- The $y-$intercept indicates that the minimum height for which this equation is valid is about $4.74$ meters. This can be because the diameter at chest height is not a good measure for smaller trees.

Use the linear regression feature of a graphing calculator to find the equation of the line of best fit for the given data set. Compare the obtained equation with the equations shown in the applet, and choose the closest one.

The following table displays some values of atmospheric pressures at different altitudes.

Altitude (thousand feet) | $0$ | $1$ | $2$ | $3$ | $4$ | $5$ |
---|---|---|---|---|---|---|

Pressure (PSI) | $14.71$ | $14.18$ | $13.75$ | $13.21$ | $12.69$ | $12.20$ |

a Use linear regression to determine the equation of the line of best fit. Round the values in the equation to two decimal places.

b What is the correlation coefficient? Round the answer to two decimal places. Is the data correlated?

c Draw a graph of the data points and the line of best fit in the same viewing window.

d Interpret the slope and the $y-$intercept of the equation of the line of best fit.

e Make a prediction for the pressure at $6000$ feet. Is this a good prediction?

a $y=-0.50x+14.71$

b **Correlation Coefficient:** $r≈-1$

**Is the Data Correlated?** Yes, because the correlation coefficient is really close to $-1.$

c **Graph**

d See solution.

e **Prediction:** About $11.7$ PSI

**Is This a Good Prediction?** Yes, see solution.

a Use the linear regression feature on a graphing calculator.

b The correlation coefficient is $r$ in the linear regression output on a graphing calculator.

c Use the graphing features on a graphing calculator.

d What do these measures indicate in a line?

e Are the data values close to the line of best fit? Does this indicate something?

a The line of best fit can be found using a graphing calculator. First, the data values need to be introduced into the calculator. This is done by pressing the $STAT $ button and selecting the option

Edit.

Then the data values can be written in the columns.

Finally, by pressing the $STAT $ button and then selecting the menu item **CALC**, the option LinReg($ax+b$)

can be found. This option gives a line of best fit, expressed as a linear function in slope-intercept form.

$y=-0.50x+14.71 $

b The correlation coefficient can be found on the linear regression results screen from Part A.

The correlation coefficient is the value of $r$ on the screen.

$r=-0.999673… $

The value of $r$ varies from $-1$ to $1.$ A value close to $-1$ indicates a negative correlation, while a value close to $1$ indicates a positive correlation. Since the value is almost $-1,$ the data have a very strong negative correlation.
c To graph the line of best fit, first press $Y= $ and write the equation of the line of best fit.

To graph the scatter plot, first push the buttons $2nd $ and $Y= .$ Then, choose one of the plots in the list. Select the option ON,

choose the type to be a scatter plot, and assign $L_{1}$ and $L_{2}$ as XList

and Ylist,

respectively.

The plot can be made by pressing the button $GRAPH .$ It is possible that after drawing the plot the window-size is not large enough to see all of the information.

To fix this, press $ZOOM $ and select the option ZoomStat.

After doing that, the window will resize to show the important information.

d In Part A the equation for the line of best fit was found.

$y=-0.50x+14.71 $

In this equation, the slope is $-0.50$ and the $y-$intercept is $14.71.$ - The slope indicates that every one thousand feet of altitude, the pressure diminishes by about $0.50$ PSI.
- The $y-$intercept indicates that the pressure at sea level is about $14.71$ PSI.

e A graphing calculator can also be used to make predictions. To do so, first the window size should be changed to fit the prediction. Since the $x-$value is given in thousands of feet, the value that should be included for $6000$ feet is $x=6.$ To change the window size, press $WINDOW .$

To find the value of $y$ when $x=6,$ press $CALC $ ($2ND $ and $TRACE ).$ Then press $ENTER $ to insert the value of $6$ for $x.$ Finally, press $ENTER $ again.

The value of the pressure at $6000$ feet is about $11.7$ PSI. Since all the data values are close to the line of best fit and the data is strongly correlated, it can be said that this is a good approximation of the actual value.

$y=-0.50x+14.71$

Solve for $y$

$y=11.71$

Davontay has a math assignment that consists of eight different exercises. He registered the time (in minutes) in which he completed the first seven exercises.

Exercise | $1$ | $2$ | $3$ | $4$ | $5$ | $6$ | $7$ |
---|---|---|---|---|---|---|---|

Time (minutes) | $4$ | $15$ | $7$ | $16$ | $8$ | $15$ | $5$ |

a What is the equation for the line of best fit using linear regression? Round the values to two decimal places.

b What is the correlation coefficient? Round the answer to two decimal places. Are the data correlated?

c Draw a graph of the data points and the line of best fit in the same viewing window.

d Interpret the slope and the $y-$intercept of the equation of the line of best fit.

e Find a prediction for the for the time it will take Davontay to complete the eighth exercise. Is it a good prediction?

a $y=0.14x+9.43$

b **Correlation Coefficient:** $r≈0.06$

**Are the Data Correlated?** No, see solution.

c **Graph:**

d See solution.

e **Prediction:** About $10.57$ minutes

**Is It a Good Prediction?** No, see solution.

a Use the linear regression feature on a graphing calculator or computer.

b The correlation coefficient is $r$ in the linear regression output on a graphing calculator.

c Use the graphing features on a graphing calculator.

d What do these measures indicate in a line?

e Are the data values close to the line of best fit? Does this indicate something?

Edit.

The data values can be written in the columns.

Finally, by pressing the $STAT $ button and then selecting the menu item **CALC**, the option LinReg($ax+b$)

can be found. This option gives a line of best fit, expressed as a linear function in slope-intercept form.

$y=0.14x+9.43 $

It should be noted that in this equation $x$ is the number of the exercise and $y$ is the time in minutes in which Davontay completed that exercise.
b The correlation coefficient can be found on the linear regression results screen from Part A.

The correlation coefficient is the value of $r$ on the screen.

$r=0.059761… $

The value of $r$ varies from $-1$ to $1.$ A value close to $-1$ indicates a negative correlation, while a value close to $1$ indicates a positive correlation. But the value of $r$ is close to $0,$ this means that the data have no correlation.
c To graph the line of best fit, first press $Y= $ and write the equation of the line of best fit.

To graph the scatter plot, first push the buttons $2nd $ and $Y= .$ Then, choose one of the plots in the list. Select the option ON,

choose the type to be a scatter plot, and assign $L_{1}$ and $L_{2}$ as XList

and Ylist,

respectively.

The plot can be made by pressing the button $GRAPH .$ It is possible that after drawing the plot the window-size is not large enough to see all of the information.

To fix this, press $ZOOM $ and select the option ZoomStat.

After doing that, the window will resize to show the important information.

Looking at the graph, it can be seen that the line of best fit is not close to any of the provided data points.

d In Part A the equation for the line of best fit was found.

$y=0.14x+9.43 $

In this equation, the slope is $0.14$ and the $y-$intercept is $9.43.$ - The slope indicates that after every exercise, the next one takes about an additional $0.14$ minutes to complete.
- The $y-$intercept indicates that the minimum time to complete an exercise is about $9.43$ minutes.

From Part C it can be noted that the line is not representative of the given data points. This means that these measures do not reflect the reality of the exercises.

e A graphing calculator can also be used to make predictions. To do so, first the window size should be changed to fit the prediction value of $x=8.$ To change the window size, press $WINDOW .$

Then, to find the value of $y$ when $x=8,$ press $CALC $ ($2ND $ and $TRACE ).$ Then press $ENTER $ to insert the value of $8$ for $x.$ Finally, press $ENTER $ again.

The value of $y$ when $x=8$ is about $10.57.$ This means that Davontay will finish the eighth exercise in less than $11$ minutes. Since none of the given data values are really close to the line of best fit and the data is not correlated, it can be said that this is a not good approximation for the actual value.

$y=0.14x+9.43$

Solve for $y$

$y=10.55$

In this lesson it was shown how to find the line of best fit for data sets and how to make predictions using these lines. Considering the examples discussed throughout the lesson, it is possible to make two conclusions.

- The lines of best fit are good to make predictions
*only if*the data have a linear correlation. - The stronger the correlation is, the more accurate the predictions are expected to be.