Sign In
| 8 Theory slides |
| 10 Exercises - Grade E - A |
| Each lesson is meant to take 1-2 classroom sessions |
Here are a few recommended readings before getting started with this lesson.
A line of best fit, also known as a regression line, is a line of fit that estimates the relationship between the values of a data set. The equation of the line of best fit has been determined using a strict mathematical method.
One commonly used method to determine a line of best fit is the method of least squares. The methods used to find the line of best fit are usually hard to do by hand. Therefore, a line of best fit can be found by performing a linear regression on a graphing calculator. As an example, consider the data set graphed above.
x | y |
---|---|
0.6 | 1.5 |
1.2 | 3.6 |
2.6 | 5.2 |
3.6 | 6.3 |
4.5 | 8.7 |
6 | 10.3 |
6.6 | 11.8 |
7.1 | 11.7 |
For a school project, Ramsha wants to investigate if there is a correlation between the width of a tree and its height. To do so, she measured the diameter at chest height and the height of some trees in a local park. Her findings are shown in the following table.
Diameter at chest (cm) | Height (m) |
---|---|
8 | 7 |
10 | 10 |
15 | 14 |
18 | 15 |
20 | 18 |
22 | 21 |
25 | 15 |
30 | 20 |
Edit.
Then the data values are written in the columns.
By pressing the STAT button and then selecting the CALC menu, the option LinReg(ax+b)
can be found. This option gives the line of best fit, expressed as a linear function in slope-intercept form.
Then, to graph the scatter plot push the buttons 2nd and Y=. Choose one of the plots in the list. Select the option ON,
choose the type to be a scatter plot, and assign L1 and L2 as XList
and Ylist,
respectively.
The plot can be made by pressing the button GRAPH. It is possible that after drawing the plot the window-size is not adequate for seeing all the information.
To fix this press ZOOM and select the option ZoomStat.
After doing so the window will resize to show the important information.
Use the linear regression feature of a graphing calculator to find the equation of the line of best fit for the given data set. Compare the obtained equation with the equations shown in the applet, and choose the closest one.
The following table displays some values of atmospheric pressures at different altitudes.
Altitude (thousand feet) | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
Pressure (PSI) | 14.71 | 14.18 | 13.75 | 13.21 | 12.69 | 12.20 |
Edit.
Then the data values can be written in the columns.
Finally, by pressing the STAT button and then selecting the menu item CALC, the option LinReg(ax+b)
can be found. This option gives a line of best fit, expressed as a linear function in slope-intercept form.
To graph the scatter plot, first push the buttons 2nd and Y=. Then, choose one of the plots in the list. Select the option ON,
choose the type to be a scatter plot, and assign L1 and L2 as XList
and Ylist,
respectively.
The plot can be made by pressing the button GRAPH. It is possible that after drawing the plot the window-size is not large enough to see all of the information.
To fix this, press ZOOM and select the option ZoomStat.
After doing that, the window will resize to show the important information.
To find the value of y when x=6, press CALC (2ND and TRACE). Then press ENTER to insert the value of 6 for x. Finally, press ENTER again.
The value of the pressure at 6000 feet is about 11.7 PSI. Since all the data values are close to the line of best fit and the data is strongly correlated, it can be said that this is a good approximation of the actual value.
Davontay has a math assignment that consists of eight different exercises. He registered the time (in minutes) in which he completed the first seven exercises.
Exercise | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
Time (minutes) | 4 | 15 | 7 | 16 | 8 | 15 | 5 |
Edit.
The data values can be written in the columns.
Finally, by pressing the STAT button and then selecting the menu item CALC, the option LinReg(ax+b)
can be found. This option gives a line of best fit, expressed as a linear function in slope-intercept form.
To graph the scatter plot, first push the buttons 2nd and Y=. Then, choose one of the plots in the list. Select the option ON,
choose the type to be a scatter plot, and assign L1 and L2 as XList
and Ylist,
respectively.
The plot can be made by pressing the button GRAPH. It is possible that after drawing the plot the window-size is not large enough to see all of the information.
To fix this, press ZOOM and select the option ZoomStat.
After doing that, the window will resize to show the important information.
Looking at the graph, it can be seen that the line of best fit is not close to any of the provided data points.
From Part C it can be noted that the line is not representative of the given data points. This means that these measures do not reflect the reality of the exercises.
Then, to find the value of y when x=8, press CALC (2ND and TRACE). Then press ENTER to insert the value of 8 for x. Finally, press ENTER again.
The value of y when x=8 is about 10.57. This means that Davontay will finish the eighth exercise in less than 11 minutes. Since none of the given data values are really close to the line of best fit and the data is not correlated, it can be said that this is a not good approximation for the actual value.
In this lesson it was shown how to find the line of best fit for data sets and how to make predictions using these lines. Considering the examples discussed throughout the lesson, it is possible to make two conclusions.
Consider the following graph of a data set and line of best fit.
To solve this exercise, we will begin by analyzing the given graph. After we make out the characteristics of the graph, we can see which of the given answer options is the best. Remember that the correlation coefficient indicates how close the data points are to the line of best fit.
Looking at the graph, we can see that the line of best fit has a positive slope. This indicates that the correlation coefficient is positive, which limits our options to 0.64 and 0.95 for the value of r. & r = - 0.92 & r =0 & r = 0.64 & r = 0.95
Since the points are really close to the line of best fit, we can safely say that 0.95 is the better option of the remaining two choices for r.
For clarity, we will look at graphs that fit the other options. First, let's look how a graph with a correlation coefficient of r=0.64 would look.
Looking at the graph, we can notice that even if the slope of the line is really similar, the data points are considerably farther apart from the line of best fit than that in the original graph. Now, let's take a look at a graph with a correlation coefficient of 0.
These graphs have a slope of 0, and the points do not neccesarily follow a pattern. Finally, let's look at a graph with coefficient - 0.92.
Looking carefully, we can notice that the data points are close to the line, but there are some points that are a little farther appart than the points with the coefficient of 0.95. More importantly, the graph has a negative slope, unlike the given data set's line of best fit.
For each of the following data sets, its line of best fit is shown. Which data set has a correlation coefficient of 0?
We are asked which of the data sets has a correlation coefficient of 0. r = 0 We know that a correlation coefficient r lies between -1 and 1.
Value of r | Correlation |
---|---|
r < 0 | Negative correlation |
r>0 | Positive correlation |
r=0 | No correlation |
The correlation also gives information about the slope of the regression lines.
Correlation | Slope of the Regression Line |
---|---|
Negative correlation | Negative |
Positive correlation | Positive |
No correlation | No slope |
Looking at the table, we can see that we need to look for a data set with no correlation. Let's examine the graphs!
Looking at the graphs, we can see that the graph D is the only one that displays no correlation, since the line has no noticeable slope.
Juno ran a survey at school to investigate if the habit of streaming shows is related to student Grade Point Average, also known as GPA. The table shows the number of hours x weekly that some students spend streaming online shows and their GPA y.
Hours x | GPA y |
---|---|
11 | 3.1 |
4 | 3.5 |
3 | 3.6 |
12 | 2.8 |
20 | 2.2 |
14 | 2.9 |
8 | 3.1 |
4 | 3.7 |
15 | 2.3 |
We have been given a table with data for hours x spent streaming and student GPA y.
Hours x | Grade Point Average y |
---|---|
11 | 3.1 |
4 | 3.5 |
3 | 3.6 |
12 | 2.8 |
20 | 2.2 |
14 | 2.9 |
8 | 3.1 |
4 | 3.7 |
15 | 2.3 |
In order to find a line of fit using our calculator, we need to first enter the values. Let's press the STAT button.
Next, we will choose the first option in the menu Edit,
and we can fill in the values in lists L1 and L2.
We can perform a regression analysis on the data by pressing the STAT button again, followed by using the right-arrow key to select the CALC menu.
This menu lists the various regressions that are available. If we choose the fourth option in the menu LinReg(ax+b)
and press ENTER, the calculator performs a linear regression using the data that was entered.
We can round the values of a and b and substitute them into the equation y= ax+ b. This gives us the equation for the line of best fit. y= -0.08 x+ 3.9 The correlation coefficient is r≈ -0.955. Since r is very close to -1, we know that there is a strong, negative correlation. Therefore, the only true statement is that the equation for the line of best fit is y=-0.08x + 3.9.
In Part A we found that the equation of the line of best fit is y= -0.08x+ 3.9. Let's identify its slope and y-intercept. Slope:& m= -0.08 y-intercept:& b= 3.9 The y-intercept of 3.9 tells us that students who stream for 0 each week should have a GPA around 3.9.
To predict the GPA of a student who streams 15 hours a week, we can substitute x= 15 into our line of best fit.
According to the line of best fit, the GPA of the student is expected to be 2.7.
Let's recall what it means to have causation.
There is causation between two variables when a change in one variable causes a change in the other.
GPA is dependent on how much time is spent on studying. This means that causation will exist if streaming less shows leaves more time for studying. However, the time left by watching less series does not always directly lead to studying more. Therefore, there is no direct causal relationship.