Mastering How to Draw a Line of Best Fit & Analyzing Strength of Correlation

Scatter Plot	Correlation Coefficient	Strength
$A$	$0.08$	Very weak positive correlation
$B$	$- 0.65$	Strong negative correlation
$C$	$0.84$	Very strong positive correlation
$D$	$- 0.92$	Very strong negative correlation

Scatter Plot

Correlation Coefficient

Strength

A

0.08

Very weak positive correlation

B

- 0.65

Strong negative correlation

C

0.84

Very strong positive correlation

D

- 0.92

Very strong negative correlation

Height (in)	$72.83$	$74.02$	$75.20$	$75.20$	$75.20$	$75.98$	$75.98$	$75.98$	$75.98$	$77.17$	$77.17$	$77.17$	$79.13$	$79.92$	$81.89$	$81.98$	$83.07$
Weight (lbs)	$178.57$	$189.60$	$182.98$	$198.42$	$198.42$	$194.01$	$205.03$	$233.69$	$218.26$	$178.57$	$213.85$	$196.21$	$235.90$	$213.85$	$251.33$	$264.56$	$264.56$

Height (in)

72.83

74.02

75.20

75.20

75.20

75.98

75.98

75.98

75.98

77.17

77.17

77.17

79.13

79.92

81.89

81.98

83.07

Weight (lbs)

178.57

189.60

182.98

198.42

198.42

194.01

205.03

233.69

218.26

178.57

213.85

196.21

235.90

213.85

251.33

264.56

264.56

	January	February	March	April	May	June	July	August	September	October	November	December
Outdoor Temperature $(^{\circ}$ F $)$	$70$	$74$	$80$	$82$	$86$	$88$	$92$	$90$	$84$	$78$	$76$	$72$
Bill $($)$	$185$	$220$	$260$	$263$	$275$	$280$	$310$	$290$	$272$	$240$	$230$	$194$

January

February

March

April

May

June

July

August

September

October

November

December

Outdoor Temperature

(^{\circ}

)

70

74

80

82

86

88

92

90

84

78

76

72

Bill

($)

185

220

260

263

275

280

310

290

272

240

230

194

Magdalena will create a model that estimates how much newborn girls weigh from birth until the first half-year.

To make the model, she collected some statistics from a stressed nurse at a hospital who has just weighed seven babies of different ages.

	$👶 🏻$ Newborn Girls $👶 🏽$
Months	$0$	$1$	$2$	$3$	$4$	$5$	$6$
Weight (lbs)	$7.01$	$7.89$	$17.85$	$10.58$	$11.73$	$13.51$	$15.06$

Write the equation of the line of best fit for the data Magdalena collected and round each coefficient to two decimal places.

How strong is the correlation between the data?

Magdalena is not really happy with the line obtained because she thinks one of the weights is wrong. She therefore decides to remove it from the data. Write the equation of the line of best fit for the second data set that Magdalena considered. Round each coefficient to two decimal places.

The line of best fit can be found using a graphing calculator. First, let's enter the data values into our calculator. This is done by pressing the STAT button and selecting the option Edit.

Next, we press the STAT button, and then select the menu item CALC. There, we can find the LinReg(ax+b) option. This option gives us a line of best fit, expressed as a linear function in slope-intercept form, and the correlation coefficient.

By rounding each coefficient to two decimal places the equation for the line of best fit is expressed as follows. Line of best fit y = 1.05x + 8.81

The strength of the correlation is measured by the correlation coefficient. Let's recall the classifications.

Correlation Strength
Value	Strength
\|r\|=0	No correlation
0 < \|r\| < 0.2	Very Weak
0.2 ≤ \|r\| < 0.4	Weak
0.4 ≤ \|r\| < 0.6	Moderate
0.6 ≤ \|r\| < 0.8	Strong
0.8 ≤ \|r\| < 1	Very Strong
\|r\|=1	Perfect

From part A, we know that the correlation coefficient of the data collected by Magdalena is r=0.58. This means the strength of the correlation is moderate.

It is likely that Magdalena is not so happy with the line of best fit because the strength of the correlation is moderate. Maybe she expected a stronger correlation. Let's investigate a bit more and plot the line y=1.05x+8.81 along with the points in the coordinate plane.

As we can see, the point (2,17.85) is quite far from the line. In addition, this point represents a two-month-old baby weighing 17.85 pounds, which is pretty unlikely. This could also be the reason why Magdalena thinks there is something wrong with the data. Given that, let's remove this point and calculate the line of best fit again.

By removing the point (2,17.85), the strength of the correlation increased and gave us a line that better fits the rest of the data. y &= 1.35x+6.70 [0.25em] r &≈ 0.9963 → Very Strong

The following table shows the mileage $x,$ in thousand of kilometers, and selling prices $y,$ in thousands of dollars, for several used motorbikes of the same year and model.

	$🛵 🛵$ Motorbikes $🛵 🛵$
Mileage (in thousands)	$29$	$19$	$15$	$36$	$10$	$25$
Price (in thousands)	$8$	$11$	$13$	$5$	$18$	$9$

Write the equation of the line of best fit and round each coefficient to two decimal places.

Find the correlation coefficient and round it to two decimal places.

Classify the strength of the correlation.

The line of best fit can be found using a graphing calculator. To get started, let's enter the data values into our calculator. This is done by pressing the STAT button and selecting the option Edit.

By rounding each coefficient to two decimal places, the equation for the line of best fit is expressed as follows. Line of best fit y = -0.46x+20.89

When we found the line of best fit in Part A, we also found the correlation coefficient. It is the r-value shown in the screen of the calculator.

By rounding the correlation coefficient to two decimal places, it is expressed as follows. r ≈ -0.97

Since r is negative, the correlation is negative. Additionally, note that r is very close to -1 which means that the correlation is very strong. That being said, the correlation is very strong and negative. This can also be appreciated graphically.

Diego, an appliance store owner, recently expanded his business to online shopping with free home delivery. After ten days, he checked the statistics of the articles shipped, however, he noticed that there were some data missing.

	$🚚 🚚$ Deliveries $🚚 🚚$
Day, $x$	$1$	$2$	$5$	$6$	$8$	$10$
Articles Shipped, $y$	$6$	$9$	$11$	$18$	$15$	$16$

Write the equation of the line of best fit. Round each coefficient to two decimal places.

Find the correlation coefficient and round it to two decimal places.

Estimate the number of articles that Diego shipped on the third day. Round the number to the nearest integer.

The line of best fit can be found using a graphing calculator. First, let's enter the data values into our calculator. This is done by pressing the STAT button and selecting the option Edit.

By rounding each coefficient to two decimal places, the equation for the line of best fit is expressed as follows. Line of Best Fit y = 1.13x+6.48 Let's plot the line along with the given data on a coordinate plane.

When we found the line of best fit in Part A, we also found the correlation coefficient. It is the r-value shown in the screen of the calculator.

By rounding the correlation coefficient to two decimal places, it is expressed as follows. r ≈ 0.85

In order to estimate the number of articles shipped on the third day, we have to substitute 3 for x into the equation for the line of the best fit.

This means that Diego shipped 10 articles on the third day.

Paulina began managing The Beefy Lifters gym last week. To seek improvements for her clients, Paulina created a data sheet reflecting the influx of people who visit the gym each day.

	$💪 🏽 🤼 🏋 🏻 ‍$ The Beefy Guys $🏋 🏾 ‍ 🤼$
Day, $x$	$1$	$2$	$3$	$4$	$5$	$6$	$7$	$8$
Number of People, $y$	$149$	$148$	$149$	$150$	$149$	$149$	$150$	$149$

Write the equation of the line of best fit that models Paulina's data. Round each coefficient to two decimal places.

Find the correlation coefficient and round it to two decimal places.

Predict the day in which

151

people will visit the gym. Round the answer to the nearest integer.

The line of best fit can be found using a graphing calculator. First, let's enter the data values into our calculator. This is done by pressing the STAT button and selecting the option Edit.

By rounding each coefficient to two decimal places, the equation for the line of best fit is expressed as follows. Line of best fit y = 0.11x+148.64 Let's plot the line along with the given data on the coordinate plane.

In Part A, we already found the correlation coefficient. It is the r-value shown in the screen of the calculator.

By rounding the correlation coefficient to two decimal places, it is expressed as follows. r ≈ 0.41 As we can see, the correlation coefficient less than 0.6. This means that the strength of the correlation is moderate or weak.

To predict the day in which 151 people will visit the gym, we have to substitute 151 for y into the equation for the line of the best fit and solve it for x.

Therefore, 151 people are expected to go to the gym on the 21st day after Paulina started managing it. However, due to what we found in Part B, the correlation between the data is not as strong as we would like. This means that the predictions might not be as good as expected.

Consider the correlation coefficients $0.063,$ $0.959,$ $- 0.966,$ and $0.793 .$ Pair each scatter plot with the most appropriate correlation coefficient.

Four Scatter Plots: One with very strong negative correlation, one with no correlation, one with strong positive correlation, and one with very strong positive correlation

To pair each scatter plot with its corresponding correlation coefficient, let's analyze each scatter plot one at a time.

Scatter Plot A

In this scatter plot, the points seem to follow a certain direction. As x increases, y tends to decrease which implies a negative correlation.

In addition to the direction, we can see that the line of best fit has a negative slope, which confirms that the correlation coefficient is negative. Considering the given options, only r=-0.966 meets this condition.

A → r=-0.966

Scatter Plot B

We can see that the points in this scatter plot do not seem to follow a particular direction — they are randomly distributed. Apart from the line of best fit, different lines can also be drawn, but none of them seem to describe the relationship between the points.

Therefore, there is no correlation between the points of Scatter Plot B. This implies that the correlation coefficient is close to 0. Considering the given options, only r=0.063 meets this condition.

B → r=0.063

Scatter Plot C

The points in in this plot follow an ascending trajectory — as x increases, y tends to increase as well. This relationship implies a positive correlation.

Additionally, the line of best fit has a positive slope, confirming that the correlation coefficient is positive. Likewise, since all the points are close to the line, the correlation coefficient must be close to 1. Considering the given options, there are two possible values for r: 0.793 or 0.959. At this time, however, we cannot conclude which corresponds to Scatter Plot C.

rcc & & r = 0.793 & ↗ & C & & or & ↘ & & & r = 0.959

Scatter Plot D

In this plot, as before, the points follow a ascending direction — as x increases, y tends to increase. This relationship implies a positive correlation.

Consequently, the correlation coefficient is positive. Once more, we have two possible options.

rcc & & r = 0.793 & ↗ & D& & or & ↘ & & & r = 0.959

When we compare Scatter Plots C and D, the points on D are more clustered and closer to the line of best fit than those on C. This implies that the correlation in D is stronger than the correlation in C. Therefore, r=0.959 is the most appropriate correlation coefficient for Scatter Plot D.

D → r = 0.959 C → r = 0.793

In the following table, each scatter plot is paired with its correlation coefficient, and the strength of the correlation is described.

Scatter Plot	Correlation Coefficient	Strength
A	-0.966	Very strong negative correlation
B	0.063	Very weak positive correlation
C	0.793	Strong positive correlation
D	0.959	Very strong positive correlation

Correlation and Strength of a Linear Fit

Catch-Up and Review

Anscombe's Quartet

Strength of Correlation

Correlation Coefficient

Extra

Correlation Coefficient Analysis

Hint

Solution

Scatter Plot $A$

Scatter Plot $B$

Scatter Plot $C$

Scatter Plot $D$

Finding the Correlation Coefficient

Predicting LeBron James' Weight

Answer

Hint

Solution

Films Released per Year

Answer

Hint

Solution

Correlation and Causation

Causation

Video Games and Grades: Parent's Against Video Games!

Hint

Solution

Temperature and the Electric Bill

Hint

Solution

Causation: Looking for Hidden Variables

Scatter Plot A

Scatter Plot B

Scatter Plot C

Scatter Plot D

Correlation and Strength of a Linear Fit

Recommended exercises

	11 Theory slides
	8 Exercises - Grade E - A
	Each lesson is meant to take 1-2 classroom sessions