Analyzing Lines of Fit

Download for free
Find the solutions in the app
Android iOS
Exercises marked with requires Mathleaks premium to view it's solution in the app. Download Mathleaks app on Google Play or iTunes AppStore.
Sections
Communicate Your Answer
Exercise name Free?
Communicate Your Answer 2
Communicate Your Answer 3
Monitoring Progress
Exercise name Free?
Monitoring Progress 1
Monitoring Progress 2
Monitoring Progress 3
Monitoring Progress 4
Exercises
Exercise name Free?
Exercises 1 A residual is the difference between the y-coordinate of the actual data point and the y-coordinate found using the equation for the line of fit. yactual point​−yline of fit​ The residual is positive if the data point lies above the line of fit and it's negative if the data point lies below the line of fit.
Exercises 2 There are two ways to use residuals to check the goodness of your line of fit,By graphing With a calculator. Graphing Because residuals are the difference between the y-coordinate of the data point and the y-coordinate produced by the line of fit, data point’s y−line of fit’s y=residual, the scatter plot of residuals should be centered around the x-axis if its a good line of fit. Positive residuals will be above the x-axis and negative residuals will be below the x-axis.If the line of fit is a bad fit, you will have too many positive or negative residuals and not enough of the other.The above residual graph shows a line of fit that lies below most of the data points rather than being centrally placed, there are 8 positive residuals and only 3 negative residuals. If the line of fit is a good fit, the scatter plot will be evenly divided by the x-axis like the one below.Calculator If you have many data points, you may want to use a graphing calculator to calculate the goodness of fit. You can enter all of your data points and use the linear regression functions to find an r value, also known the correlation coefficient. The values for r will always be within the range: -1≤r≤1. When r is close to -1, it is a strong negative correlation and the line is a good fit. When r is close to 1, it is a strong positive correlation and the line is a good fit. When r is close to 0, it is a weak correlation, the line is a bad fit or the data just has no correlation to it.
Exercises 3 Interpolation and extrapolation are very similar, they are both processes by which we use existing data to make educated guesses about future data. The main difference between the two processes has to do with their prefixes. What do "extra" and "inter" mean to you? Typically, they mean:Extra: Above and beyond, in addition to something. Inter: Inside, between or among the group.These are used the same in this case as well! Extrapolation is using the known data to make predictions outside the known range. Interpolation is using the known data to make predictions inside the known range.
Exercises 4 A correlation coefficient tells us two things:The strength of the correlation, it's strong if r is close to ∣1∣ and weak if it's close to 0. If the correlation is positive or negative. Let's look at what each of the given values of r tells us.rPositive or Negative?Strong or Weak? -0.98NegativeStrong 0.96PositiveStrong -0.09NegativeWeak 0.97PositiveStrong The only correlation coefficient that doesn't match the features of another is r=-0.09. It is an extremely weak fit, less than 10% of the data points can be approximated with the line of fit. The other three values for r are extremely strong correlations, almost all the data points can be explained by the line of fit.
Exercises 5 Let's begin by making a table of the residual values.xyy=4x−5y-Value from modelResidual -4-184(-4)−5-21-18−(-21)=3 -3-134(-3)−5-17-13−(-17)=4 -2-104(-2)−5-13-10−(-13)=3 -1-74(-1)−5-9-7−(-9)=2 0-24(0)−5-5-2−(-5)=3 104(1)−5-10−(-1)=1 264(2)−536−3=3 3104(3)−5710−7=3 4154(4)−51115−11=4 Now we can create a scatter plot using the given x-values and our residuals. Remember, if the model is a good fit for the data, the scatter plot will be evenly distributed above and below the x-axis. Also, there will be no apparent patterns.This line of fit does not model the data well. It is not evenly distributed above and below the x-axis. The residual scatter plot shows that every residual is positive.
Exercises 6 Let's begin by making a table of the residual values.xyy=6x+4y-Value from modelResidual 1136(1)+41013−10=3 2146(2)+41614−16=-2 3236(3)+42223−22=1 4266(4)+42826−28=-2 5316(5)+43431−34=-3 6426(6)+44042−40=2 7456(7)+44645−46=-1 8526(8)+45252−52=0 9626(9)+45862−58=4 Now we can create a scatter plot using the given x-values and our residuals. Remember, if the model is a good fit for the data, the scatter plot will be evenly distributed above and below the x-axis. Also, there will be no apparent patterns.This line of fit models the data well. It is evenly distributed above and below the x-axis.
Exercises 7 Let's begin by making a table of the residual values.xyy=-1.3x+1y-Value from modelResidual -89-1.3(-8)+111.49−11.4=-2.4 -610-1.3(-6)+18.810−8.8=1.2 -45-1.3(-4)+16.25−6.2=-1.2 -28-1.3(-2)+13.68−3.6=4.4 0-1-1.3(0)+11-1−1=-2 21-1.3(2)+1-1.61−(-1.6)=2.6 4-4-1.3(4)+1-4.2-4−(-4.2)=0.2 6-12-1.3(6)+1-6.8-12−(-6.8)=-5.2 8-7-1.3(8)+1-9.4-7−(-9.4)=2.4 Now we can create a scatter plot using the given x-values and our residuals. Remember, if the model is a good fit for the data, the scatter plot will be evenly distributed above and below the x-axis. Also, there will be no apparent patterns.This line of fit models the data well. It is evenly distributed above and below the x-axis.
Exercises 8 Let's begin by making a table of the residual values.xyy=-0.5x−2y-Value from modelResidual 4-1-0.5(4)−2-4-1−(-4)=3 6-3-0.5(6)−2-5-3−(-5)=2 8-6-0.5(8)−2-6-6−(-6)=0 10-8-0.5(10)−2-7-8−(-7)=-1 12-10-0.5(12)−2-8-10−(-8)=-2 14-10-0.5(14)−2-9-10−(-9)=-1 16-10-0.5(16)−2-10-10−(-10)=0 18-9-0.5(18)−2-11-9−(-11)=2 20-9-0.5(20)−2-12-9−(-12)=3 Now we can create a scatter plot using the given x-values and our residuals. Remember, if the model is a good fit for the data, the scatter plot will be evenly distributed above and below the x-axis. Also, there will be no apparent patterns.This line of fit does not model the data well. It is not evenly distributed above and below the x-axis. We can see that the residual points form a ⋃-shaped pattern, which suggests the data are not linear.
Exercises 9 Let's begin by making a table of the residual values. Note that, in our table, y represents the growth in inches of an elk's antlers in week x.xyy=-0.7x+6.8y-Value from modelResidual 16.0-0.7(1)+6.86.16.0−6.1=-0.1 25.5-0.7(2)+6.85.45.5−5.4=0.1 34.7-0.7(3)+6.84.74.7−4.7=0 43.9-0.7(4)+6.84.03.9−4.0=-0.1 53.3-0.7(5)+6.83.33.3−3.3=0 Now we can create a scatter plot using the given x-values and our residuals. Remember, if the model is a good fit for the data, the scatter plot will be evenly distributed above and below the x-axis. Also, there will be no apparent patterns.This line of fit models the data well. It is evenly distributed above and below the x-axis.
Exercises 10 Let's begin by making a table of the residual values. Note that, in our table, y represents the approximate number(in thousands) of movie tickets sold in month x.xyy=1.3x+27y-Value from modelResidual 1271.3(1)+2728.327−28.3=-1.3 2281.3(2)+2729.628−29.6=-1.6 3361.3(3)+2730.936−30.9=5.1 4281.3(4)+2732.228−32.2=-4.2 5321.3(5)+2733.532−33.5=-1.5 6351.3(6)+2734.835−34.8=0.2 Now we can create a scatter plot using the given x-values and our residuals. Remember, if the model is a good fit for the data, the scatter plot will be evenly distributed above and below the x-axis. Also, there will be no apparent patterns.As we can see, the points are not evenly dispersed about the horizontal axis. Therefore, the line of fit does not model the data well.
Exercises 11 Let's begin by entering the data into our calculator and using the linear regression analysis tools.We can round the values of a and b and substitute them into the equation y=ax+b. This gives us the equation for the line of best fit. y=2.1x−8​ We can see how the line fits with the data by plotting the data points and graphing the line on the same coordinate plane.The calculator output gives us the value of the correlation coefficient, r. r=0.9803≈0.980​ This tells us that correlation is both positive and very strong. We can tell that it is strong because it is extremely close to 1, which would be a direct correlation explained by a line that goes through all of the points.
Exercises 12 Let's begin by entering the data into our calculator and using the linear regression analysis tools.We can round the values of a and b and substitute them into the equation y=ax+b. This gives us the equation for the line of best fit. y=-1.3x+8​ We can see how the line fits with the data by plotting the data points and graphing the line on the same coordinate plane.The calculator output gives us the value of the correlation coefficient, r. r=-0.8858≈-0.886​ This tells us that correlation is both negative and strong. We can tell that it is strong because it is close to -1, which would be a direct correlation explained by a line that goes through all of the points.
Exercises 13 Let's begin by entering the data into our calculator and using the linear regression analysis tools.We can round the values of a and b and substitute them into the equation y=ax+b. This gives us the equation for the line of best fit. y=1.4x+16​ We can see how the line fits with the data by plotting the data points and graphing the line on the same coordinate plane.The calculator output gives us the value of the correlation coefficient, r. r=0.9986≈0.999​ This tells us that correlation is both positive and very strong. We can tell that it is strong because it is extremely close to 1, which would be a direct correlation explained by a line that goes through all of the points.
Exercises 14 Let's begin by entering the data into our calculator and using the linear regression analysis tools.We can round the value of b and substitute it along with a into the equation y=ax+b. This gives us the equation for the line of best fit. y=-x+11​ We can see how the line fits with the data by plotting the data points and graphing the line on the same coordinate plane.The calculator output gives us the value of the correlation coefficient, r. r=-0.4435≈-0.444​ This tells us that correlation is both negative and moderate. We can tell that it is moderate because it is around -0.5, which is a halfway between a direct correlation and no correlation.
Exercises 15 The written equation has interchanged the values of a and b. According to the display, we have a=-4.47 and b=23.16. The coefficient to x should thus be -4.47 and the constant 23.16. Therefore, our equation should be y=-4.47x+23.16.
Exercises 16 When looking at linear regression output on a calculator, we can learn about the correlation and goodness of our line of fit by interpreting the correlation coefficient. Be sure that you are looking at the value of r, not the value of r2. In this case, we have: r=-.9994724136. When r is close to ∣1∣, it means that there is a strong correlation and when r is close to 0, it means that there is a weak correlation. This r value indicates a very strong correlation. A positive value for r indicates a positive correlation and a negative value indicates a negative correlation. In this case, we have a negative correlation. Therefore, this data has a strong negative correlation, not a strong positive correlation.
Exercises 17 aUsing a graphing calculator, we can create lists for our data and use the linear regression features to fit a line of best fit. For this data, the given output is:This means that the line of best fit for the data is: y=381x−566. We can see how the line fits with the data by plotting the points and graphing the line on the same coordinate plane.bThe correlation coefficient is always represented with the variable r. In the linear regression output, we were given that r=0.989 for our data set. The correlation coefficient r is always between -1≤r≤1, where positive values represent a positive slope and negative values represent a negative slope. Additionally, the closer the value is to 0, the weaker the correlation. Our value, r=0.989, tells us that the correlation is positive and very strong.cWith our line of best fit being y=381x−566, we know that the slope is 381 and the y-intercept is -566. To interpret these values we need to understand that x is the number of minutes that have passed and y is the number of people who reported feeling the earthquake. The slope, RunRise​=1 minute381 people​, tells us that, on average, 381 people reported the earthquake each minute after it hit. The y-intercept doesn't actually make any sense in the context of the problem, there cannot be a negative number of people reporting the earthquake. It is simply an extension of the line beyond reasonable boundaries.
Exercises 18 aLet's begin by entering the data into our calculator and using the linear regression analysis tools.We can round the value of b and substitute it along with a into the equation y=ax+b. This gives us the equation for the line of best fit. y=x+7​ We can see how the line fits with the data by plotting the data points and graphing the line on the same coordinate plane.bThe calculator output gives us the value of the correlation coefficient, r. r=0.6194≈0.619​ This tells us that correlation is both positive and moderately strong. We can tell that it is strong because it is closer to 1 than to 0.cIn Part A, we found the equation for the line of best fit. y=x+7​ In this equation, the slope is 1 and the y-intercept is 7.The slope tells us that, on average, the number of volunteers at an animal shelter is increasing by approximately 1 person each day. The y-intercept gives us a prediction for the number of volunteers on a day before the first day, where x=0.
Exercises 19 aTo perform a linear regression we first have to enter the values into lists. Push STAT​, choose Edit, and then enter the values in the first two columns.To do a linear regression we push STAT​, scroll right to CALC, and then choose the fourth option in the list, LinReg.We can see the equation for the line of the best fit. y=-0.2x+20​bWe can find correlation coefficient r on the screen with linear regression results.r=-0.968 is considered to be a strong negative correlation.c Since the data shows the price in thousands of dollars and mileage in thousands of miles, the slope of -0.2 means that for every 1000 miles car is decreasing in value by $200. Meanwhile, the y-intercept has no interpretation because a used car can't have the mileage equal to 0.dIn order to estimate the mileage of a car that costs $15500 we have to substitute 15.5 for y in the equation for the line of the best fit. y=-0.2x+20y=15.515.5=-0.2x+20LHS−20=RHS−20-4.5=-0.2xLHS⋅-5=RHS⋅-5x=22.5 This means that the mileage of the car with a price of $15500 is equal to 22500 mi.eIn order to estimate the price of a car with 6000 miles we have to substitute 6 for x in the equation for the line of the best fit. y=-0.2x+20x=6y=-0.2(6)+20Multiplyy=-1,2+20Add termsy=18800 This means that the price of a car with 6000 miles is equal to $18800.
Exercises 20 aTo perform a linear regression we first have to enter the values into lists. Push STAT​, choose Edit, and then enter the values in the first two columns.To do a linear regression we push STAT​, scroll right to CALC, and then choose the fourth option in the list, LinReg.We can see the equation for the line of the best fit. y=4.9x−37.7​bWe can find correlation coefficient r on the screen with linear regression results.r=0.936 is considered to be a strong positive correlation.c Since the data shows the cost in thousands of dollars and length in feet, the slope of 4.9 means that for every 1 foot sailboat is increasing in value by $4900 Meanwhile, the y-intercept has no interpretation because a sailboat can't have no length.dIn order to estimate the cost of a sailboat that is 20 feet long, we have to substitute 20 for x in the equation for the line of the best fit. y=4.9x−37.7x=20y=4.9(20)−37.7Multiplyy=98−37.7Subtract termy=60.3 This means that the cost of the sailboat with a length of 20 feet is equal to $60300.eIn order to estimate the length of a sailboat that costs $147000, we have to substitute 147 for y in the equation for the line of the best fit. y=4.9x−37.7y=147147=4.9x−37.7LHS+37.7=RHS+37.7184.4=4.9xLHS/4.9=RHS/4.9x≈37.6 This means that the length of a sailboat that costs 147000 is equal to 37.5 feet.
Exercises 21 When you use your phone a lot, the battery dies faster than if you leave it untouched for that same amount of time. Therefore, if you are talking on the phone, the battery will lose more of its charge the longer you spend talking. A line of fit that matches this situation would be likely to resemble the graph below.As you talk, the battery life is being drained. This is a negative correlation and a causal relationship, the phone usage is causing the battery life to decrease. Notice, the domain can only be values within the first quadrant. You cannot talk for a negative number of minutes and you cannot continue talking after the phone battery has been fully drained.
Exercises 22 Does the height of a toddler correlate to the size of their vocabulary? More often than not, taller toddlers will have a larger vocabulary. The line of fit would probably look something like the graph below, showing a positive correlation.Now the question is: Does change in height cause change in vocabulary size? The answer is: Definitely not! When looking at a situation like this, we must keep all possible outside factors in mind. Height completely depends on genetics and health while vocabulary size depends on many things including how often a child is read to by a parent or guardian. Correlation doesn’t imply causation! The most plausible explanation here is that taller kids are more often older and humans learn more words as they age.
Exercises 23 Neither buying a hat make your head bigger/smaller nor bigger/smaller head makes you buy more hats. This means that correlation between the number of hats you own and the size of your head is very unlikely.
Exercises 24 Since, on average, heavier dogs are bigger, they have longer tails as well. However, gaining extra weight by a dog won't make its tail extend. This means that there is a positive correlation between the weight of a dog and the length of its tail but there is no causal relationship.
Exercises 25 Examples of data with a strong correlation but without a causal relationship are often discussed in the statistics world because: Correlation doesn’t imply causation! For example, did you know that ice cream sales can predict murder rates? It's true. The more ice cream that is sold in an area, the higher the rate of serious crimes. There is a strong correlation between the two variables. But does one cause the other? Does eating ice cream making people want to commit crimes? We thought ice cream makes people happy! The truth is that there are underlying factors that come into play. More ice cream is purchased in larger cities because there are more people. Larger cities have high crime rates because there are more people. More ice cream is purchased in summer because it is hot outside. There is a higher crime rate in summer because it is easier to get away after you've committed the crime.
Exercises 26 We can look at each scatter plot and determine its matching correlation coefficient by noting a few key features. Is it a strong or weak correlation? Is it a positive or negative correlation? Let's look at the four given graphs.GraphStrong or weak?Positive or negative? aStrongPositive bStrongNegative cNo correlationNo correlation dWeakPositive Now, let's look at the information we can gather from the correlation coefficients.CoefficientStrong or weak?Positive or negative? A, r=0-02No correlationNo correlation B, r=0.98-StrongPositive C, r=-0.97StrongNegative D, r=0.69-WeakPositive Now that we have noted the key features from each piece of information, we can match the graphs to the correlation coefficients. We have that: ​a→Bb→Cc→Ad→D​
Exercises 27 aWhen we use the linear regression features on our calculators, we get the following output:This output gives y=-0.08x+3.8 as our line of best fit and a correlation coefficient r=-0.965. Since r is very close to -1, we know that there is a strong, negative correlation.bWith the equation y=-0.08x+3.8, we know that the slope is -0.08 and the y-intercept is 3.8. To interpret these values, we need to understand that x is the hours of TV watched by the student each week and y is their GPA. We can interpret the slope, RunRise​=1 hours of TV-0.08 grade points​, as: the students' GPAs drop -0.08 points for each hour of TV watched per week. The y-intercept tells us that students who watch 0 hours of TV each week should have a GPA around 3.8.cTo predict the GPA of a student who watches 14 hours per week of homework, we can substitute x=14 into our line of best fit. y=-0.08x+3.8x=14y=-0.08⋅14+3.8Use a calculatory=2.68y=2.7 According to the line of best fit, their GPA would be expected to be approximately 2.7.dDoes the number of hours of TV watched directly influence a student's GPA? No. While there is a strong correlation, there is not a causal relationship. Can you think of reasons why there would be a correlation? What are some underlying factors? Perhaps the students who aren't watching TV are spending more time studying. Perhaps the students are spending more time doing homework? Or, in the opposite direction, what about the students who are watching educational TV? Their GPAs may not fit the data perfectly.
Exercises 28 In order to see how the new point would affect the correlation, let's plot the original data and line of best fit. Then we can add the new data point and see how it relates to the others.This point is relatively far away from the other data points as well as the line of best fit. Including this point would definitely weaken the correlation.
Exercises 29 aThe line of best fit found in the previous exercise was: y=381x−566. We need to use this equation to predict how many people reported the earthquake after 9 minutes and after 15 minutes. We can substitute these values into the equation for x and calculate y. y=381x−566x=9y=381⋅9−566Use a calculatory=2863 After 9 minutes, the line of best fit predicts that approximately 2863 people would have reported the earthquake. y=381x−566x=15y=381⋅15−566Use a calculatory=5149 After 15 minutes, the line of best fit predicts that approximately 5149 people would have reported the earthquake.bWe can compare the predicted results to the observed results:MinutesPredictedObserved 928632750 1551493200 For 9 minutes, the prediction is higher than the observed number but it is not all that much higher. After 15 minutes, the predicted value is significantly higher than the observed value. Why might this be the case? The line of best fit is a straight line that will continue on with the same slope forever. More than likely, after a certain number of minutes, everyone who will report the earthquake has already done so. The trend will slow down and eventually stop growing.
Exercises 30 The scatter plot below depicts the attendance numbers at two separate towns' local beaches for one consecutive week. Both beaches had relatively low attendance Monday through Friday, most likely because those are working days. On Saturday, the weather was beautiful and both beaches were very busy. On Sunday, the weather was great but only Beach 1 was open.Correlation? The correlation is positive and, other than the Sunday point (19,0), very strong. Possible correlation coefficients could range from between 0.7<r<0.9. It might be smart to remove the outlier when calculating a model for this relationship.Causal? Whether or not this is a causal relationship depends on quite a few factors, the main ones being:How close are the towns? If they are very close, residents of one town may travel to the other town's beach occasionally. If they are far away, this may not be as possible. Are the beaches private for citizens of that particular town? If yes, the other town's residents would not be allowed to go to the other beach.More than likely, the biggest factors of the attendance numbers for both beaches would be weather and day of the week. The attendance of Beach 1 does not directly influence the attendance of Beach 2, outside factors do. Therefore, it is not a causal relationship.
Exercises 31 aTo write an equation of the regression line, we first have to enter our ordered pairs into lists in our calculator. We can do this if we press the STAT button and then choose Edit.Once the values have been entered, a regression can be performed by pressing the STAT button again, followed by the Right Arrow key to select CALC in the menu. This menu lists the various regressions that are available. If we choose LinReg, the calculator performs a linear regression test.The regression line can be written as y=513.5x−298. We can also see that the correlation coefficient is 0.993 which is very strong. This means there is a strong correlation between years passed and the rise in text messages.bA casual relationship would imply that the passage of time actually causes the number of text messages to increase. But this can't be the case. It's the existence of humans and a larger population owning cellphones year after year, that causes it. If humans cease to exist, so will texting.cThe residuals show us how much the observations deviate from the estimated ones according to our line of best fit. Therefore, by subtracting the model value from the observation value, we get the residual.YearObservationObservation−(513.5x−298)Residual 1241241−(513.5⋅1−298)25.5 2601601−(513.5⋅2−298)-128 313601360−(513.5⋅3−298)117.5 418601860−(513.5⋅4−298)50 522062206−(513.5⋅5−298)-63.5 Let's also plot the residuals using our graphing calculator. To do this, you have to go to your STAT PLOT menu and choose the square plots. Use L1 for Xlist and RESID for Ylist. To graph the residuals, press 2nd followed by STAT and then choose RESID from the list of names.The residuals seem to be evenly distributed around the x-axis, which suggests that the regression fits the model well. Note that we changed the window settings to -1≤x≤6 and -500≤y≤500 to fit the data.dDetermining if a model fits its data using residuals is very much up for interpretation. What do evenly distributed residuals look like? Would you know it when you saw it? The good thing about using a correlation coefficient, is that our judgement is reduced to a number between 0 and 1. If the number is closer to 1, it's a good model. If it's closer to 0, it's not a good model.
Exercises 32 The graph of a linear function can be portrayed as a single, straight line in a coordinate plane. To begin determining if the data given in the table represents a linear function, let's first plot the data as (x,y) coordinate pairs.If the function is linear, connecting these points will form a straight line. Otherwise, we will have shown that the function is nonlinear.Now that we have connected all of our points, we can see that they do not all lie on the same line in the coordinate plane. Therefore, the function is nonlinear.
Exercises 33 The graph of a linear function can be portrayed as a single, straight line in a coordinate plane. To begin determining if the data given in the table represents a linear function, let's first plot the data as (x,y) coordinate pairs.If the function is linear, connecting these points will form a straight line. Otherwise, we will have shown that the function is nonlinear.Since we can use a straight edge to connect all of the given points, they lie on the same line in the coordinate plane. Therefore, the function is linear.