Sign In
| 8 Theory slides |
| 8 Exercises - Grade E - A |
| Each lesson is meant to take 1-2 classroom sessions |
Here are a few recommended readings before getting started with this lesson.
One way to assess how well a line of fit describes the data is to analyze residuals.
A residual is the vertical distance between a data point and the line of fit. When a line of fit has been drawn on a scatter plot, not all of the data points lie exactly on the line — some of them are above the line and some below. Therefore, each data point has one residual, which can be positive, negative, or zero.
A residual can also be defined as the observed y-value of a data point minus its predicted y-value, found using the line of fit.
The applet generates bivariate data and the equation of a line of fit for the data. Calculate the sum of the squares of the residuals for the given equation.
The table below shows the finishing times, in seconds, for the Olympic gold medalist in the men's 100-meter dash for the last six Olympic games. Olympic Game 1 represents the 2000 Olympic games and Olympic Game 6 represents the 2020 Olympic games.
Olympic Game | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
Finishing Time (sec) | 9.87 | 9.85 | 9.69 | 9.63 | 9.81 | 9.80 |
Sum of Squared Residuals for Equation II: 0.077
x | y (Actual) | y Predicted by y=-0.1x+10 | Residual for y=-0.1x+10 |
---|---|---|---|
1 | 9.87 | y=-0.1(1)+10=9.9 | 9.87−9.9=-0.03 |
2 | 9.85 | y=-0.1(2)+10=9.8 | 9.85−9.8=0.05 |
3 | 9.69 | y=-0.1(3)+10=9.7 | 9.69−9.7=-0.01 |
4 | 9.63 | y=-0.1(4)+10=9.6 | 9.63−9.6=0.03 |
5 | 9.81 | y=-0.1(5)+10=9.5 | 9.81−9.5=0.31 |
6 | 9.80 | y=-0.1(6)+10=9.4 | 9.80−9.4=0.40 |
Calculate power
Add terms
x | y (Actual) | y Predicted by y=-0.05x+10 | Residual for y=-0.05x+10 |
---|---|---|---|
1 | 9.87 | y=-0.05(1)+10=9.95 | 9.87−9.95=-0.08 |
2 | 9.85 | y=-0.05(2)+10=9.9 | 9.85−9.9=-0.05 |
3 | 9.69 | y=-0.05(3)+10=9.85 | 9.69−9.85=-0.16 |
4 | 9.63 | y=-0.05(1)+10=9.8 | 9.63−9.8=-0.17 |
5 | 9.81 | y=-0.05(5)+10=9.75 | 9.81−9.75=0.06 |
6 | 9.80 | y=-0.05(6)+10=9.70 | 9.80−9.70=0.10 |
Calculate power
Add terms
x | Residual for y=-0.1x+10 | Residual for y=-0.05x+10 |
---|---|---|
1 | -0.03 | -0.08 |
2 | 0.05 | -0.05 |
3 | -0.01 | -0.16 |
4 | 0.03 | -0.17 |
5 | 0.31 | 0.06 |
6 | 0.47 | 0.1 |
The points (x,residual) for each equation will be graphed on a scatter plot.
As can be seen, the residuals for Equation II are close the x-axis and therefore the sum of their squares has a lower value.
Maya is researching used cars similar to the one her older sister drives. She found some data showing the mileages x, in thousands of miles, and the selling prices y, in thousands dollars, of several used cars near her.
x | 23 | 12 | 18 | 30 | 6 | 26 |
---|---|---|---|---|---|---|
y | 15 | 15 | 17 | 12 | 19 | 15 |
Example Equation: y=-0.25x+20
Start by making a scatter plot for the given data set. Draw a line that passes through two data points and write its equation. Then, calculate the residuals for the equation.
Substitute (4,20) & (16,16)
Subtract term
ba=b/4a/4
Put minus sign in front of fraction
Distribute -31
LHS+20=RHS+20
a=33⋅a
Add fractions
x | y (Actual) | y Predicted by the equation | Residual |
---|---|---|---|
6 | 19 | y=-31(6)+364=358 | 19−358=-31 |
12 | 15 | y=-31(12)+364=352 | 15−352=-37 |
18 | 17 | y=-31(18)+364=346 | 17−346=35 |
23 | 15 | y=-31(23)+364=341 | 15−341=34 |
26 | 15 | y=-31(26)+364=338 | 15−358=37 |
30 | 12 | y=-31(30)+364=334 | 12−334=32 |
(-a)2=a2
(ba)m=bmam
Calculate power
Add fractions
Calculate quotient
x | y (Actual) | y Predicted by the equation | Residual |
---|---|---|---|
6 | 19 | y=-0.25(6)+20=18.5 | 19−18.5=0.5 |
12 | 15 | y=-0.25(12)+20=17 | 15−17=-2 |
18 | 17 | y=-0.25(18)+20=15.5 | 17−15.5=1.5 |
23 | 15 | y=-0.25(23)+20=14.25 | 15−14.25=0.75 |
26 | 15 | y=-0.25(26)+20=13.5 | 15−13.5=1.5 |
30 | 12 | y=-0.25(30)+20=12.5 | 12−12.5=-0.5 |
(-a)2=a2
Calculate power
Add terms
The applet generates bivariate data and two equations that model the data. To see the coordinates of a data point, move the cursor over it. Use the sum of squared residuals to determine the better fit for the data.
Davontay volunteers to take part in research that checks if a vitamin supplement shortens the length of the flu. The data collected from 10 patients are shown with a line of fit in the following diagram.
The residual is the actual value minus the predicted value. Residual = lObserved y-value - lPredicted y-value Let's identify the coordinates of the data point when x=4.
It is the point (4,5). Now we calculate the y-value predicted by the line of fit.
With this information, we can calculate the residual. Residual=5-4.5= 0.5 Since y-values represent days, the residual also represents days. The residual is 0.5 days.
We found that the residual for 4 months is 0.5 days.
This means that this person, having taken the supplements for 4 months, had flu that lasted for 0.5 days more than what was predicted. Therefore, the answer is option C.
Tiffaniqua records the winning times for various swim meets at Washington High School. Tiffaniqua and her teacher decides to check if the winning times are related the height of the swimmer. They draw the following residual plot.
Which scatter plot best represents the data?
The residual shows the difference between the actual value and the predicted value from the model. Residual = lObserved y-value - lPredicted y-value A positive residual means the actual value is above the line of fit and a negative residual means the actual value is below the line of fit. Notice that near the vertical axis the residuals are greater compared to further to the right.
Therefore, when a line of fit is drawn on the data sets, the points to the right should be close to the line of fit.
The data set in option A can be the scatter plot of the data compared to the other data sets.
Given the look of the residual plot, the difference between the predicted value and the actual values is less as the height of the swimmers increases.
This means our model is better at predicting the winning times of taller swimmers than shorter swimmers. The answer is then A.
LaShay writes a different linear equation for each of four different studies. To determine if the dependent and independent variable of each study has a linear relationship, she plots the residuals for each study.
From the scatter plot of residuals, we can determine if the dependent and independent variables are linearly related or not. To do so, we check two things in a residual plot.
Let's now consider the residual plots A and D.
We can recognize a ⋃-shaped pattern in Plot A, and a line in Plot D. Therefore, the variables in these studies are not linearly related.
For Plot B and Plot C, the residuals are randomly distributed about the x-axis. However, the residuals in Plot C vary significantly compared to the residuals in Plot B. Therefore, of the given choices, the variables in study B can be described with a linear equation.