Sign In
| | 8 Theory slides |
| | 8 Exercises - Grade E - A |
| | Each lesson is meant to take 1-2 classroom sessions |
Consider the scatter plot of bivariate data. Move the the line in such a way that it models the data.
One way to assess how well a line of fit describes the data is to analyze residuals.
A residual is the vertical distance between a data point and the line of fit. When a line of fit has been drawn on a scatter plot, not all of the data points lie exactly on the line — some of them are above the line and some below. Therefore, each data point has one residual, which can be positive, negative, or zero.
A residual can also be defined as the observed y-value of a data point minus its predicted y-value, found using the line of fit.
Residual = lObserved y-value - lPredicted y-value
Generally, the smaller the absolute values of the residuals, the more reliable the line of fit is. A scatter plot of the residuals can be used to determine how well a model fits data set. The independent variable and the residuals are graphed as ordered pairs (x,residual).
The applet generates bivariate data and the equation of a line of fit for the data. Calculate the sum of the squares of the residuals for the given equation.
The table below shows the finishing times, in seconds, for the Olympic gold medalist in the men's 100-meter dash for the last six Olympic games. Olympic Game 1 represents the 2000 Olympic games and Olympic Game 6 represents the 2020 Olympic games.
| Olympic Game | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Finishing Time (sec) | 9.87 | 9.85 | 9.69 | 9.63 | 9.81 | 9.80 |
Equation I: & y=- 0.1x+10 Equation II: & y = - 0.05x+10 In these equations, x represents the Olympic Game number as described above, and y represents the finishing time in seconds. For each equation, calculate the sum of squared residuals. Which of the equations is a better fit?
Sum of Squared Residuals for Equation II: 0.077
Equation I: & y=- 0.1x+10 Equation II: & y = - 0.05x+10 In order to determine which line of fit is better, the residuals for both lines will be calculated first. For a data point, its residual is the difference between the y-value of the data point and the y-value predicted by the line of fit.
| x | y (Actual) | y Predicted by y=- 0.1x+10 | Residual for y=- 0.1x+10 |
|---|---|---|---|
| 1 | 9.87 | y=- 0.1( 1)+10= 9.9 | 9.87- 9.9= - 0.03 |
| 2 | 9.85 | y=- 0.1( 2)+10= 9.8 | 9.85- 9.8= 0.05 |
| 3 | 9.69 | y=- 0.1( 3)+10= 9.7 | 9.69- 9.7= - 0.01 |
| 4 | 9.63 | y=- 0.1( 4)+10= 9.6 | 9.63- 9.6= 0.03 |
| 5 | 9.81 | y=- 0.1( 5)+10= 9.5 | 9.81- 9.5= 0.31 |
| 6 | 9.80 | y=- 0.1( 6)+10= 9.4 | 9.80- 9.4= 0.40 |
The values on the last column of the table will be squared and added.
Calculate power
Add terms
The sum of squared residuals for Equation I, SSR_1, is 0.2605. Similarly, the residuals for Equation II will be calculated.
| x | y (Actual) | y Predicted by y=- 0.05x+10 | Residual for y=- 0.05x+10 |
|---|---|---|---|
| 1 | 9.87 | y=- 0.05( 1)+10= 9.95 | 9.87- 9.95= - 0.08 |
| 2 | 9.85 | y=- 0.05( 2)+10= 9.9 | 9.85- 9.9= - 0.05 |
| 3 | 9.69 | y=- 0.05( 3)+10= 9.85 | 9.69- 9.85= - 0.16 |
| 4 | 9.63 | y=- 0.05( 1)+10= 9.8 | 9.63- 9.8= - 0.17 |
| 5 | 9.81 | y=- 0.05( 5)+10= 9.75 | 9.81- 9.75= 0.06 |
| 6 | 9.80 | y=- 0.05( 6)+10= 9.70 | 9.80- 9.70= 0.10 |
Now that the residuals are found, they can be squared and added.
Calculate power
Add terms
The sum of squared residuals for Equation II, SSR_2, is 0.077. As a result, Equation II is the better line of fit because it has a lesser sum of squared residuals. SSR_2 & & SSR_1 0.077 & < & 0.2605
| x | Residual for y=- 0.1x+10 | Residual for y=- 0.05x+10 |
|---|---|---|
| 1 | - 0.03 | - 0.08 |
| 2 | 0.05 | - 0.05 |
| 3 | - 0.01 | - 0.16 |
| 4 | 0.03 | - 0.17 |
| 5 | 0.31 | 0.06 |
| 6 | 0.47 | 0.1 |
The points (x,residual) for each equation will be graphed on a scatter plot.
As can be seen, the residuals for Equation II are close the x-axis and therefore the sum of their squares has a lower value.
Maya is researching used cars similar to the one her older sister drives. She found some data showing the mileages x, in thousands of miles, and the selling prices y, in thousands dollars, of several used cars near her.
| x | 23 | 12 | 18 | 30 | 6 | 26 |
|---|---|---|---|---|---|---|
| y | 15 | 15 | 17 | 12 | 19 | 15 |
Maya wants to write the equation of a line of fit for this data set. She knows that the smaller the values of the residuals, the more reliable the line of fit is. She then decides to write an equation in such a way that the sum of squared residual is less than 10. Help Maya write an equation that satisfies the condition.
It appears that the line passes through the points (4,20) and (16,16). Knowing two points on the line, the slope of the line and its equation can be found. To find the slope, the points need to be substituted into the Slope Formula.
Substitute ( 4,20) & ( 16,16)
Subtract term
a/b=.a /4./.b /4.
Put minus sign in front of fraction
Next, using m and either point, the equation of the line can be written in the point-slope form. Use m=- 13 and (x_1,y_1)=(4,20). y-y_1 = m(x-x_1) ⇓ y-20 = - 1/3(x-4) To write the equation in slope-intercept form, y will be isolated.
Distribute - 1/3
LHS+20=RHS+20
a = 3* a/3
Add fractions
Finally, the residuals for this equation can be calculated. To do so, the difference between the y-value of a data point and the corresponding y-value predicted by the line of fit will be calculated for each data point.
| x | y (Actual) | y Predicted by the equation | Residual |
|---|---|---|---|
| 6 | 19 | y=- 1/3( 6)+64/3= 58/3 | 19- 58/3= - 1/3 |
| 12 | 15 | y=- 1/3( 12)+64/3= 52/3 | 15- 52/3= - 7/3 |
| 18 | 17 | y=- 1/3( 18)+64/3= 46/3 | 17- 46/3= 5/3 |
| 23 | 15 | y=- 1/3( 23)+64/3= 41/3 | 15- 41/3= 4/3 |
| 26 | 15 | y=- 1/3( 26)+64/3= 38/3 | 15- 58/3= 7/3 |
| 30 | 12 | y=- 1/3( 30)+64/3= 34/3 | 12- 34/3= 2/3 |
Next, the sum of squared residuals can be found by adding the squares of the numbers in the last column of the table.
(- a)^2=a^2
(a/b)^m=a^m/b^m
Calculate power
Add fractions
Calculate quotient
The sum of squared residuals for the equation is 16. c|c Equation & Sum of Residuals [0.4em] [-0.8em] y = - 1/3x+64/3 & 16 This equation does not satisfy the condition that Maya wants. Another equation can be found by slightly increasing the slope of the above equation and slightly decreasing its y-intercept. Use the applet below to find an equation. For example, y=- 0.25x+20 can be a good candidate.
The residuals for this equation can be found as follows.
| x | y (Actual) | y Predicted by the equation | Residual |
|---|---|---|---|
| 6 | 19 | y=- 0.25( 6)+20= 18.5 | 19- 18.5= 0.5 |
| 12 | 15 | y=- 0.25( 12)+20= 17 | 15- 17= - 2 |
| 18 | 17 | y=- 0.25( 18)+20= 15.5 | 17- 15.5= 1.5 |
| 23 | 15 | y=- 0.25( 23)+20= 14.25 | 15- 14.25= 0.75 |
| 26 | 15 | y=- 0.25( 26)+20= 13.5 | 15- 13.5= 1.5 |
| 30 | 12 | y=- 0.25( 30)+20= 12.5 | 12- 12.5= - 0.5 |
Now, the sum of squared residuals for this equation can be calculated.
(- a)^2=a^2
Calculate power
Add terms
As a result, the sum of squared residuals for y=- 0.25x+20 satisfies the condition. c|c Equation & Sum of Residuals [0.4em] [-0.8em] y = - 1/3x+64/3 & 16 > 10 [0.8em] [-0.8em] y=- 0.25x+20 & 9.5625 < 10 Note that there are numerous equations that satisfy Maya's condition. Here only one of them was shown. The following applet shows the equation of a line fit and the sum of squared residuals. Use it to see how the sum changes as the equation changes.
The applet generates bivariate data and two equations that model the data. To see the coordinates of a data point, move the cursor over it. Use the sum of squared residuals to determine the better fit for the data.
In this lesson, lines of fit and their residuals have been analyzed. When a line models a data set well, the sum of the squared residuals for the line is relatively small. Move the points and the line in the graph to see how the residuals and the sum of their squares change.
| Price, x | 19 | 29 | 35 | 43 | 45 |
|---|---|---|---|---|---|
| Number Sold, y | 125 | 105 | 80 | 62 | 45 |
Maya models the data set with two equations. Equation I: & y=- 2.7x+ 176 Equation II: & y = - 3.5x+ 201
We are asked to find the sum of the square of the residuals of Equation I. Equation I: & y=- 2.7x+176 Let's first find the residuals. For a data point, its residual is the difference between the y-value of the data point and the y-value predicted by the line of fit.
| x | y (Actual) | y Predicted by y=- 2.7x+176 | Residual |
|---|---|---|---|
| 19 | 125 | - 2.7( 19)+176= 124.7 | 125- 124.7= 0.3 |
| 29 | 105 | - 2.7( 29)+176= 97.7 | 105- 97.7= 7.3 |
| 35 | 80 | - 2.7( 35)+176= 81.5 | 80- 81.5= - 1.5 |
| 43 | 62 | - 2.7( 43)+176= 59.9 | 62- 59.9= 2.1 |
| 45 | 45 | - 2.7( 45)+176= 54.5 | 45- 54.5= - 9.5 |
The values on the last column of the table will be squared and added.
The sum of squared residuals for Equation I, SSR_1, is 150.
In a similar fashion, we will find the residuals for Equation II.
| x | y (Actual) | y Predicted by y=- 3.5x+201 | Residual |
|---|---|---|---|
| 19 | 125 | - 3.5( 19)+201= 134.5 | 125- 134.5= - 9.5 |
| 29 | 105 | - 3.5( 29)+201= 99.5 | 105- 99.5= 5.5 |
| 35 | 80 | - 3.5( 35)+201= 78.5 | 80- 78.5= 1.5 |
| 43 | 62 | - 3.5( 43)+201= 50.5 | 62- 50.5= 11.5 |
| 45 | 45 | - 3.5( 45)+201= 43.5 | 45- 43.5= 1.5 |
Now that the residuals are found, they can be squared and added.
The sum of squared residuals for Equation II, SSR_2, is 257.
The smaller the sum of squared residuals, the better the line of fit. Since the Equation I has a lesser sum of squared residuals, it is the better line of fit.
SSR_1 & & SSR_2
150 & < & 257