Sign In
| 8 Theory slides |
| 8 Exercises - Grade E - A |
| Each lesson is meant to take 1-2 classroom sessions |
Here are a few recommended readings before getting started with this lesson.
One way to assess how well a line of fit describes the data is to analyze residuals.
A residual is the vertical distance between a data point and the line of fit. When a line of fit has been drawn on a scatter plot, not all of the data points lie exactly on the line — some of them are above the line and some below. Therefore, each data point has one residual, which can be positive, negative, or zero.
A residual can also be defined as the observed y-value of a data point minus its predicted y-value, found using the line of fit.
The applet generates bivariate data and the equation of a line of fit for the data. Calculate the sum of the squares of the residuals for the given equation.
The table below shows the finishing times, in seconds, for the Olympic gold medalist in the men's 100-meter dash for the last six Olympic games. Olympic Game 1 represents the 2000 Olympic games and Olympic Game 6 represents the 2020 Olympic games.
Olympic Game | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
Finishing Time (sec) | 9.87 | 9.85 | 9.69 | 9.63 | 9.81 | 9.80 |
Sum of Squared Residuals for Equation II: 0.077
x | y (Actual) | y Predicted by y=-0.1x+10 | Residual for y=-0.1x+10 |
---|---|---|---|
1 | 9.87 | y=-0.1(1)+10=9.9 | 9.87−9.9=-0.03 |
2 | 9.85 | y=-0.1(2)+10=9.8 | 9.85−9.8=0.05 |
3 | 9.69 | y=-0.1(3)+10=9.7 | 9.69−9.7=-0.01 |
4 | 9.63 | y=-0.1(4)+10=9.6 | 9.63−9.6=0.03 |
5 | 9.81 | y=-0.1(5)+10=9.5 | 9.81−9.5=0.31 |
6 | 9.80 | y=-0.1(6)+10=9.4 | 9.80−9.4=0.40 |
Calculate power
Add terms
x | y (Actual) | y Predicted by y=-0.05x+10 | Residual for y=-0.05x+10 |
---|---|---|---|
1 | 9.87 | y=-0.05(1)+10=9.95 | 9.87−9.95=-0.08 |
2 | 9.85 | y=-0.05(2)+10=9.9 | 9.85−9.9=-0.05 |
3 | 9.69 | y=-0.05(3)+10=9.85 | 9.69−9.85=-0.16 |
4 | 9.63 | y=-0.05(1)+10=9.8 | 9.63−9.8=-0.17 |
5 | 9.81 | y=-0.05(5)+10=9.75 | 9.81−9.75=0.06 |
6 | 9.80 | y=-0.05(6)+10=9.70 | 9.80−9.70=0.10 |
Calculate power
Add terms
x | Residual for y=-0.1x+10 | Residual for y=-0.05x+10 |
---|---|---|
1 | -0.03 | -0.08 |
2 | 0.05 | -0.05 |
3 | -0.01 | -0.16 |
4 | 0.03 | -0.17 |
5 | 0.31 | 0.06 |
6 | 0.47 | 0.1 |
The points (x,residual) for each equation will be graphed on a scatter plot.
As can be seen, the residuals for Equation II are close the x-axis and therefore the sum of their squares has a lower value.
Maya is researching used cars similar to the one her older sister drives. She found some data showing the mileages x, in thousands of miles, and the selling prices y, in thousands dollars, of several used cars near her.
x | 23 | 12 | 18 | 30 | 6 | 26 |
---|---|---|---|---|---|---|
y | 15 | 15 | 17 | 12 | 19 | 15 |
Example Equation: y=-0.25x+20
Start by making a scatter plot for the given data set. Draw a line that passes through two data points and write its equation. Then, calculate the residuals for the equation.
Substitute (4,20) & (16,16)
Subtract term
ba=b/4a/4
Put minus sign in front of fraction
Distribute -31
LHS+20=RHS+20
a=33⋅a
Add fractions
x | y (Actual) | y Predicted by the equation | Residual |
---|---|---|---|
6 | 19 | y=-31(6)+364=358 | 19−358=-31 |
12 | 15 | y=-31(12)+364=352 | 15−352=-37 |
18 | 17 | y=-31(18)+364=346 | 17−346=35 |
23 | 15 | y=-31(23)+364=341 | 15−341=34 |
26 | 15 | y=-31(26)+364=338 | 15−358=37 |
30 | 12 | y=-31(30)+364=334 | 12−334=32 |
(-a)2=a2
(ba)m=bmam
Calculate power
Add fractions
Calculate quotient
x | y (Actual) | y Predicted by the equation | Residual |
---|---|---|---|
6 | 19 | y=-0.25(6)+20=18.5 | 19−18.5=0.5 |
12 | 15 | y=-0.25(12)+20=17 | 15−17=-2 |
18 | 17 | y=-0.25(18)+20=15.5 | 17−15.5=1.5 |
23 | 15 | y=-0.25(23)+20=14.25 | 15−14.25=0.75 |
26 | 15 | y=-0.25(26)+20=13.5 | 15−13.5=1.5 |
30 | 12 | y=-0.25(30)+20=12.5 | 12−12.5=-0.5 |
(-a)2=a2
Calculate power
Add terms
The applet generates bivariate data and two equations that model the data. To see the coordinates of a data point, move the cursor over it. Use the sum of squared residuals to determine the better fit for the data.
The equation y=3x−4 models the data in the table.
x | -3 | -2 | -1 | 0 | 1 | 2 | 3 |
---|---|---|---|---|---|---|---|
y | -12 | -7 | -5 | -1 | 0 | 4 | 6 |
Which graph is the scatter plot of the residuals?
Let's make a table of the residual values.
x | y | y Predicted by y=3x-4 | Residual |
---|---|---|---|
- 3 | - 12 | 3( - 3)-4= - 13 | - 12-( - 13)= 1 |
- 2 | - 7 | 3( - 2)-4= - 10 | - 7-( - 10)= 3 |
- 1 | - 5 | 3( - 1)-4= - 7 | - 5-( - 7)= 2 |
0 | - 1 | 3( 0)-4= - 4 | - 1-( - 4)= 3 |
1 | 0 | 3( 1)-4= - 1 | 0-( - 1)= 1 |
2 | 4 | 3( 2)-4= 2 | 4- 2= 2 |
3 | 6 | 3( 3)-4= 5 | 6- 5= 1 |
Now we can create a scatter plot using the given x-values and our residuals.
This graph corresponds to option C.
If the model is a good fit for the data, the scatter plot will be evenly distributed above and below the x-axis. In other words, there should not be any apparent patterns. Let's examine the residual plot.
As we can see, all residuals are positive and are not evenly distributed above and below the x-axis. We can conclude that the line of fit y=3x-4 does not model the given data set well.
Mountain goats are exciting animals that can be seen in many parts of the world. The horns are an iconic characteristic of the male goat, also called a ram.
The table shows the length y, in centimeters, of a goat's horn each year. The equation y=2x+14.5 models the data in the table.
Year, x | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
Growth, y | 15.2 | 18.5 | 21.4 | 22.7 | 23.8 |
Which graph is the scatter plot of the residuals?
Let's make a table of the residual values.
x | y | y Predicted by y=2x+14.5 | Residual |
---|---|---|---|
1 | 15.2 | 2( 1)+14.5= 16.5 | 15.2- 16.5= - 1.3 |
2 | 18.5 | 2( 2)+14.5= 18.5 | 18.5- 18.5= 0 |
3 | 21.4 | 2( 3)+14.5= 20.5 | 21.4- 20.5= 0.9 |
4 | 22.7 | 2( 4)+14.5= 22.5 | 22.7- 22.5= 0.2 |
5 | 23.8 | 2( 5)+14.5= 24.5 | 23.8- 24.5= - 0.7 |
Now, we can create a scatter plot using the given x-values and our residuals.
This graph corresponds to option B.
We know that an equation is a good fit for the data, if the scatter plot is evenly distributed above and below the x-axis. Additionally, there should not be apparent patterns, which would indicate that a linear model is appropriate.
The line of fit does not model the data well because we see that a ⋂-shaped pattern appears.
Paulina does some research about a profession she wants to have in the future. She records the experience and hourly wage for a sample of six people who work in that field.
Experience, x | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
Hourly Wage, y | 15.00 | 16.00 | 17.00 | 19.00 | 20.25 | 22.50 |
To find which statement(s) is correct, we need to find the residuals for Paulina's model.
Let's make a table of the residual values.
x | y | y Predicted by y=1.5x+13 | Residual |
---|---|---|---|
1 | 15.00 | 1.5( 1)+13 = 14.5 | 15.00- 14.5= 0.5 |
2 | 16.00 | 1.5( 2)+13 = 16 | 16.00- 16= 0 |
3 | 17.00 | 1.5( 3)+13 = 17.5 | 17.00- 17.5= - 0.5 |
4 | 19.00 | 1.5( 4)+13 = 19 | 19.00- 19= 0 |
5 | 20.25 | 1.5( 5)+13 = 20.5 | 20.25- 20.5= - 0.25 |
6 | 22.50 | 1.5( 6)+13 = 22 | 22.50- 22= 0.5 |
Now we can create a scatter plot using the given x-values and our residuals.
We see that the scatter plot is evenly distributed above and below the x-axis. Additionally, there is no apparent pattern. Therefore, we can conclude that a linear model is appropriate. Statement I is false.
Next, we will calculate the sum of the absolute value of the residuals.
It is equal to 1.75. Statment II is true.
Finally, we will calculate the sum of squared residuals.
The sum of squared residuals is not equal to 0.75, which means that Statement III is also false. Therefore, only Statement II is true.
Vincenzo is excited to be studying linear models.
He models two data sets using linear models. Consider the residual plots for each data set.
We determine the appropriate models by looking at the scatter plot of residuals. There are two things to consider.
Let's now consider the given residual plots.
We see that the residuals of Data Set I are evenly distributed above and below the x-axis and there is no apparent pattern. Therefore, a linear model is appropriate for Data Set I.
However, we can recognize a ⋃-shaped pattern in the residual plot for Data Set II, which means that a non-linear model is appropriate for Data Set II. Therefore, the statement in option B is true.