{{ 'ml-label-loading-course' | message }}

{{ tocSubheader }}

{{ 'ml-toc-proceed-mlc' | message }}

{{ 'ml-toc-proceed-tbs' | message }}

An error ocurred, try again later!

Chapter {{ article.chapter.number }}

{{ article.number }}. # {{ article.displayTitle }}

{{ article.intro.summary }}

Show less Show more Lesson Settings & Tools

| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |

| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |

| {{ 'ml-lesson-time-estimation' | message }} |

This lesson aims to show how to find the linear function that better models a scatter plot or data set. Additionally, it will be shown how to use these linear functions to make predictions.
### Catch-Up and Review

**Here are a few recommended readings before getting started with this lesson.**

Explore

Given a scatter plot, a line of fit can be used to make good predictions of values that are not known. Since there are many possible lines of fit, finding the one that most accurately represents the given data points is an important goal. The goal is finding the line of fit in which the different sums of the residuals is as close to $0$ as possible.

Examine how the different sums of the residuals change when moving the line. The most commonly used residual is the sum of the squared differences.

Discussion

A line of best fit, also known as a regression line, is a line of fit that estimates the relationship between the values of a data set. The equation of the line of best fit has been determined using a strict mathematical method.

One commonly used method to determine a line of best fit is the *method of least squares*. The methods used to find the line of best fit are usually hard to do by hand. Therefore, a line of best fit can be found by performing a linear regression on a graphing calculator. As an example, consider the data set graphed above.

$x$ | $y$ |
---|---|

$0.6$ | $1.5$ |

$1.2$ | $3.6$ |

$2.6$ | $5.2$ |

$3.6$ | $6.3$ |

$4.5$ | $8.7$ |

$6$ | $10.3$ |

$6.6$ | $11.8$ |

$7.1$ | $11.7$ |

Example

For a school project, Ramsha wants to investigate if there is a correlation between the width of a tree and its height. To do so, she measured the diameter at chest height and the height of some trees in a local park. Her findings are shown in the following table.

Diameter at chest (cm) | Height (m) |
---|---|

$8$ | $7$ |

$10$ | $10$ |

$15$ | $14$ |

$18$ | $15$ |

$20$ | $18$ |

$22$ | $21$ |

$25$ | $15$ |

$30$ | $20$ |

a What is the equation of the line of best fit using linear regression? Round the values in the equation to two decimal places.

b What is the correlation coefficient? Round the value to two decimal places. Are the data correlated?

c Graph the data points and the line of best fit.

d Write an interpretation of the $y-$intercept and the slope.

a $y=0.55x+4.74$

b **Correlation Coefficient:** $r≈0.86$

**Are the Data Correlated?** Yes, see solution.

c **Graph:**

d See solution.

a Use the linear regression feature on a graphing calculator.

b The correlation coefficient is $r$ in the linear regression output on a graphing calculator.

c Use the graphing features on a graphing calculator.

d What do these measures indicate in a line?

a The line of best fit can be found using a graphing calculator. First, the data values need to be introduced into the calculator. This is done by pressing the $STAT $ button and then selecting the option

Edit.

Then the data values are written in the columns.

By pressing the $STAT $ button and then selecting the **CALC** menu, the option LinReg($ax+b$)

can be found. This option gives the line of best fit, expressed as a linear function in slope-intercept form.

$y=0.55x+4.74 $

b The correlation coefficient can be found on the linear regression results screen from Part A.

The correlation coefficient is the value of $r$ on the screen.

$r≈0.86 $

The value of $r$ varies from $-1$ to $1.$ A value close to $-1$ indicates a negative correlation, while a value close to $1$ indicates a positive correlation. Since the correlation coefficient is close to $1,$ the data has a strong positive correlation.
c To graph the line of best fit, first press $Y= $ and write the equation of the line of best fit.

Then, to graph the scatter plot push the buttons $2nd $ and $Y= .$ Choose one of the plots in the list. Select the option ON,

choose the type to be a scatter plot, and assign $L_{1}$ and $L_{2}$ as XList

and Ylist,

respectively.

The plot can be made by pressing the button $GRAPH .$ It is possible that after drawing the plot the window-size is not adequate for seeing all the information.

To fix this press $ZOOM $ and select the option ZoomStat.

After doing so the window will resize to show the important information.

d In Part A the equation for the line of best fit was found.

$y=0.55x+4.74 $

In this equation, the slope is $0.55$ and the $y-$intercept is $4.74.$ - The slope indicates that for every centimeter that the tree grows in diameter, about $0.55$ meters are gained in height.
- The $y-$intercept indicates that the minimum height for which this equation is valid is about $4.74$ meters. This can be because the diameter at chest height is not a good measure for smaller trees.

Pop Quiz

Use the linear regression feature of a graphing calculator to find the equation of the line of best fit for the given data set. Compare the obtained equation with the equations shown in the applet, and choose the closest one.

Example

The following table displays some values of atmospheric pressures at different altitudes.

Altitude (thousand feet) | $0$ | $1$ | $2$ | $3$ | $4$ | $5$ |
---|---|---|---|---|---|---|

Pressure (PSI) | $14.71$ | $14.18$ | $13.75$ | $13.21$ | $12.69$ | $12.20$ |

a Use linear regression to determine the equation of the line of best fit. Round the values in the equation to two decimal places.

b What is the correlation coefficient? Round the answer to two decimal places. Is the data correlated?

c Draw a graph of the data points and the line of best fit in the same viewing window.

d Interpret the slope and the $y-$intercept of the equation of the line of best fit.

e Make a prediction for the pressure at $6000$ feet. Is this a good prediction?

a $y=-0.50x+14.71$

b **Correlation Coefficient:** $r≈-1$

**Is the Data Correlated?** Yes, because the correlation coefficient is really close to $-1.$

c **Graph**

d See solution.

e **Prediction:** About $11.7$ PSI

**Is This a Good Prediction?** Yes, see solution.

a Use the linear regression feature on a graphing calculator.

b The correlation coefficient is $r$ in the linear regression output on a graphing calculator.

c Use the graphing features on a graphing calculator.

d What do these measures indicate in a line?

e Are the data values close to the line of best fit? Does this indicate something?

a The line of best fit can be found using a graphing calculator. First, the data values need to be introduced into the calculator. This is done by pressing the $STAT $ button and selecting the option

Edit.

Then the data values can be written in the columns.

Finally, by pressing the $STAT $ button and then selecting the menu item **CALC**, the option LinReg($ax+b$)

can be found. This option gives a line of best fit, expressed as a linear function in slope-intercept form.

$y=-0.50x+14.71 $

b The correlation coefficient can be found on the linear regression results screen from Part A.

The correlation coefficient is the value of $r$ on the screen.

$r=-0.999673… $

The value of $r$ varies from $-1$ to $1.$ A value close to $-1$ indicates a negative correlation, while a value close to $1$ indicates a positive correlation. Since the value is almost $-1,$ the data have a very strong negative correlation.
c To graph the line of best fit, first press $Y= $ and write the equation of the line of best fit.

To graph the scatter plot, first push the buttons $2nd $ and $Y= .$ Then, choose one of the plots in the list. Select the option ON,

choose the type to be a scatter plot, and assign $L_{1}$ and $L_{2}$ as XList

and Ylist,

respectively.

The plot can be made by pressing the button $GRAPH .$ It is possible that after drawing the plot the window-size is not large enough to see all of the information.

To fix this, press $ZOOM $ and select the option ZoomStat.

After doing that, the window will resize to show the important information.

d In Part A the equation for the line of best fit was found.

$y=-0.50x+14.71 $

In this equation, the slope is $-0.50$ and the $y-$intercept is $14.71.$ - The slope indicates that every one thousand feet of altitude, the pressure diminishes by about $0.50$ PSI.
- The $y-$intercept indicates that the pressure at sea level is about $14.71$ PSI.

e A graphing calculator can also be used to make predictions. To do so, first the window size should be changed to fit the prediction. Since the $x-$value is given in thousands of feet, the value that should be included for $6000$ feet is $x=6.$ To change the window size, press $WINDOW .$

To find the value of $y$ when $x=6,$ press $CALC $ ($2ND $ and $TRACE ).$ Then press $ENTER $ to insert the value of $6$ for $x.$ Finally, press $ENTER $ again.

The value of the pressure at $6000$ feet is about $11.7$ PSI. Since all the data values are close to the line of best fit and the data is strongly correlated, it can be said that this is a good approximation of the actual value.

$y=-0.50x+14.71$

▼

Solve for $y$

$y=11.71$

Example

Davontay has a math assignment that consists of eight different exercises. He registered the time (in minutes) in which he completed the first seven exercises.

Exercise | $1$ | $2$ | $3$ | $4$ | $5$ | $6$ | $7$ |
---|---|---|---|---|---|---|---|

Time (minutes) | $4$ | $15$ | $7$ | $16$ | $8$ | $15$ | $5$ |

a What is the equation for the line of best fit using linear regression? Round the values to two decimal places.

b What is the correlation coefficient? Round the answer to two decimal places. Are the data correlated?

c Draw a graph of the data points and the line of best fit in the same viewing window.

d Interpret the slope and the $y-$intercept of the equation of the line of best fit.

e Find a prediction for the for the time it will take Davontay to complete the eighth exercise. Is it a good prediction?

a $y=0.14x+9.43$

b **Correlation Coefficient:** $r≈0.06$

**Are the Data Correlated?** No, see solution.

c **Graph:**

d See solution.

e **Prediction:** About $10.57$ minutes

**Is It a Good Prediction?** No, see solution.

a Use the linear regression feature on a graphing calculator or computer.

b The correlation coefficient is $r$ in the linear regression output on a graphing calculator.

c Use the graphing features on a graphing calculator.

d What do these measures indicate in a line?

e Are the data values close to the line of best fit? Does this indicate something?

Edit.

The data values can be written in the columns.

Finally, by pressing the $STAT $ button and then selecting the menu item **CALC**, the option LinReg($ax+b$)

can be found. This option gives a line of best fit, expressed as a linear function in slope-intercept form.

$y=0.14x+9.43 $

It should be noted that in this equation $x$ is the number of the exercise and $y$ is the time in minutes in which Davontay completed that exercise.
b The correlation coefficient can be found on the linear regression results screen from Part A.

The correlation coefficient is the value of $r$ on the screen.

$r=0.059761… $

The value of $r$ varies from $-1$ to $1.$ A value close to $-1$ indicates a negative correlation, while a value close to $1$ indicates a positive correlation. But the value of $r$ is close to $0,$ this means that the data have no correlation.
c To graph the line of best fit, first press $Y= $ and write the equation of the line of best fit.

To graph the scatter plot, first push the buttons $2nd $ and $Y= .$ Then, choose one of the plots in the list. Select the option ON,

choose the type to be a scatter plot, and assign $L_{1}$ and $L_{2}$ as XList

and Ylist,

respectively.

The plot can be made by pressing the button $GRAPH .$ It is possible that after drawing the plot the window-size is not large enough to see all of the information.

To fix this, press $ZOOM $ and select the option ZoomStat.

After doing that, the window will resize to show the important information.

Looking at the graph, it can be seen that the line of best fit is not close to any of the provided data points.

d In Part A the equation for the line of best fit was found.

$y=0.14x+9.43 $

In this equation, the slope is $0.14$ and the $y-$intercept is $9.43.$ - The slope indicates that after every exercise, the next one takes about an additional $0.14$ minutes to complete.
- The $y-$intercept indicates that the minimum time to complete an exercise is about $9.43$ minutes.

From Part C it can be noted that the line is not representative of the given data points. This means that these measures do not reflect the reality of the exercises.

e A graphing calculator can also be used to make predictions. To do so, first the window size should be changed to fit the prediction value of $x=8.$ To change the window size, press $WINDOW .$

Then, to find the value of $y$ when $x=8,$ press $CALC $ ($2ND $ and $TRACE ).$ Then press $ENTER $ to insert the value of $8$ for $x.$ Finally, press $ENTER $ again.

The value of $y$ when $x=8$ is about $10.57.$ This means that Davontay will finish the eighth exercise in less than $11$ minutes. Since none of the given data values are really close to the line of best fit and the data is not correlated, it can be said that this is a not good approximation for the actual value.

$y=0.14x+9.43$

▼

Solve for $y$

$y=10.55$

Closure

In this lesson it was shown how to find the line of best fit for data sets and how to make predictions using these lines. Considering the examples discussed throughout the lesson, it is possible to make two conclusions.

- The lines of best fit are good to make predictions
*only if*the data have a linear correlation. - The stronger the correlation is, the more accurate the predictions are expected to be.

Loading content