Expand menu menu_open Minimize Go to startpage home Home History history History expand_more
{{ item.displayTitle }}
navigate_next
No history yet!
Progress & Statistics equalizer Progress expand_more
Student
navigate_next
Teacher
navigate_next
{{ filterOption.label }}
{{ item.displayTitle }}
{{ item.subject.displayTitle }}
arrow_forward
No results
{{ searchError }}
search
menu
{{ courseTrack.displayTitle }} {{ printedBook.courseTrack.name }} {{ printedBook.name }}
{{ statistics.percent }}% Sign in to view progress
search Use offline Tools apps
Digital tools Graphing calculator Geometry 3D Graphing calculator Geogebra Classic Mathleaks Calculator Codewindow
Course & Book Compare textbook Studymode Stop studymode Print course
Tutorials Video tutorials Formulary

Video tutorials

How Mathleaks works

Mathleaks Courses

How Mathleaks works

play_circle_outline
Study with a textbook

Mathleaks Courses

How to connect a textbook

play_circle_outline

Mathleaks Courses

Find textbook solutions in the app

play_circle_outline
Tools for students & teachers

Mathleaks Courses

Share statistics with a teacher

play_circle_outline

Mathleaks Courses

How to create and administrate classes

play_circle_outline

Mathleaks Courses

How to print out course materials

play_circle_outline

Formulary

Formulary for text courses looks_one

Course 1

looks_two

Course 2

looks_3

Course 3

looks_4

Course 4

looks_5

Course 5

Login account_circle menu_open

Drawing Scatter Plots and Lines of Best Fit

Sometimes two different data sets can be collected from the same source. Graphing these data sets in a scatter plot and fitting a mathematical model to the data can be a helpful analysis tool.
Concept

Scatter Plot

A scatter plot is a graph that relates numerical data with two parameters in a coordinate plane. The data points are plotted as ordered pairs. For instance, the number of ice cream cones sold daily at a kiosk can be plotted against the temperature that day.

Each point shows the temperature and ice cream cone sales for a particular day. Notice how, since it's more likely that ice cream sales depend on the temperature of a given day, the independent and dependent variables have been placed on the x-x\text{-} and y-y\text{-}axes, respectively. This decision is usually very important when constructing scatter plots
Concept

Correlation and Causation

Concept

Correlation

When there is a statistical connection between two parameters of data, such that a change in one is associated with a change in the other, they are said to be correlated. For instance, up until approximately age 18,18, there is a correlation between age and height: older people are generally taller, and taller people are generally older.

Concept

Causation

Causation is a relationship between two correlated quantities where one directly affects the other.

  • Causation exists: An example of a correlation where there is also a causation is height and age. Aging directly causes growth, up until some point.
Causality between age and length
  • No causation: In winter, both the number of house fires and car accidents increase — they are correlated. However, house fires do not cause car accidents. There is a potential common factor that can explain the increase in both: winter, which causes both slippery road conditions and more candles to be lit, which leads to more fires. Here, there is a correlation, but no causation.
The difference between correlation and causality
Concept

Types of Correlation

If two quantities correlate in such a way that an increase of one quantity is associated with an increase in the other, they are said to be positively correlated. Likewise, an increase in one quantity associated with a decrease in the other, is called a negative correlation.

Three diagrams which showing positive, negative and no correlation

The more the data points appear to follow a specific trend, the more correlated they are. If they are situated almost exactly on a line, the quantities are said to be strongly correlated, while if they are more spread out, the quantities are weakly correlated.

Three diagrams which show the strong, weak and no correlation

How strongly two quantities correlate can be described using the correlation coefficient, r.r. It can take on values between -1\text{-}1 and 1.1. Values near -1\text{-}1 means that the correlation is strong and negative, while a strong, positive correlation leads to a value close to 1.1. If there is no correlation, it has a value of 0.0.

Concept

Line of Fit

Data that has been drawn in a scatter plot and shows a moderate or strong correlation can be modeled using a line of fit. This is a line drawn on the scatter plot that is as close to as many data points as possible. For instance, a line of fit can be drawn for the weight of kittens plotted against their age.

A line of fit can be used to make predictions and generalize the trends of data sets. When a line of fit is determined using strict mathematical methods, it is commonly referred to as a line of best fit.
Exercise


The music-and-animal amusement park "Hiphop-opotamus Park" has 2020 hippos that they regularly measure. The length and height of each hippo at the time of the last measurement has been plotted below.

Draw a line of fit to the data. Then, comment on the apparent association in the data. Lastly, estimate the equation of the line and interpret its slope and yy-intercept in context.

Solution
Example

Drawing a Line of Fit

A line of fit should be drawn so that as many points as possible are close to the line, with roughly half above and half below.

Looking at the graph, we can see that an increase in height is associated with an increase in length. Thus, the line of fit has a positive slope. Therefore, there is a positive correlation between a hippo's height and length. Note that height does not directly cause length, nor does length directly cause height. Thus, we can argue that there is no causation between these quantities, only a correlation.

Example

Finding the Equation of the Line

Now that a line of fit has been drawn, we can approximate its equation in point-slope form. To find the slope of the line, we'll use two points that are on the line — not necessarily points from the data set.

It can be seen that (1,5)(1,5) and (4.5,10)(4.5,10) lie on the line. We'll substitute these points into the slope formula.
m=y2y1x2x1m=\dfrac{y_2-y_1}{x_2-x_1}
m=1054.51m=\dfrac{{\color{#009600}{10}}-{\color{#0000FF}{5}}}{{\color{#009600}{4.5}}-{\color{#0000FF}{1}}}
m=53.5m=\dfrac{5}{3.5}
m1.43m\approx1.43
Next, we can use mm and either point to write the equation of the line. We'll use (1,5).(1,5). yy1=m(xx1)y5=1.43(x1) y-y_1=m(x-x_1) \quad \Leftrightarrow \quad y-5=1.43(x-1) To write the equation in slope-intercept form we can isolate y.y.
y5=1.43(x1)y-5=1.43(x-1)
y5=1.43x1.43y-5=1.43x-1.43
y=1.43x+3.57y=1.43x+3.57
Thus, the equation for the line of fit is y=1.43x+3.57. y=1.43x+3.57.
Example

Interpreting the slope and the yy-intercept

The calculations above approximate the following about the line of fit. m=1.431andb=3.57. m=\frac{1.43}{1} \quad \text{and} \quad b=3.57. Since xx represents the height of a hippo and yy represents the length, the slope tells us that for every 11 foot in height the hippo has an additional 1.431.43 feet in length.
Similarly, the yy-intercept means that a hippo whose height measures 00 feet will be 3.573.57 feet long. Notice that this last statement doesn't make sense. This is because the point (0,3.57)(0,3.57) lies outside the given data set. The relationship established for the given data set only applies to values that fall within it. In other words, extrapolation is not reliable while interpolation is.

info Show solution Show solution
Method

Analyzing Residuals

When a line of fit has been drawn on a scatter plot, it is possible to determine how well the line models the data. This can be done by analyzing the residuals. A residual is the difference between a data point and the line of fit.

Generally, the smaller the absolute values of the residuals, the more reliable the line of fit is. If the residuals are graphed in a scatter plot, a random or non-uniform distribution indicates a reliable line of fit. Likewise, if some kind of pattern appears in the scatter plot, the line is probably not a good fit for the data.

Most graphing calculators have a function called linear regression which can be used to find a precise line of fit, using strict rules. This line of fit is then called the line of best fit or the regression line.

Digital tools

Enter values

The first step is to enter the data points in the calculator. On a TI calculator, this is done by first pressing the STAT button, and then selecting the option Edit....

The window in the calculator, which shows Stat and then Edit

This gives a number of columns, marked L11, L22, L3,3, etc.

Calculator showing two empty lists

With the help of the arrow keys, choose where in the lists to fill in the data values. Enter the data points' xx-values in list L1,_1, and the corresponding yy-values in L2._2. Values are entered into the fields using the number buttons followed by ENTER.

Calculator that shows two lists where you entered values
Digital tools

Fitting the line

After entering the values, press the STAT button, then select the menu item CALC to the right.

Counter that displays the list of CALC and where you have chosen LinReg

The option LinReg(ax+b) gives a line of best fit, expressed as a linear function in slope-intercept form.

Calculator showing a linear function regression
In this case, the line of best fit is described by the function y=-5.92x+28.8.y= \text{-} 5.92x + 28.8. The correlation coefficient, r,r, is roughly -0.99,\text{-} 0.99, indicating a strong negative correlation. If rr doesn't show up, press CATALOG (2ND + 00), scroll down to the option DiagnosticOn and enable it by pressing ENTER twice. Then, find the line of best fit again.
{{ 'mldesktop-placeholder-grade-tab' | message }}
{{ 'mldesktop-placeholder-grade' | message }} {{ article.displayTitle }}!
{{ grade.displayTitle }}
{{ exercise.headTitle }}
{{ 'ml-tooltip-premium-exercise' | message }}
{{ 'ml-tooltip-programming-exercise' | message }} {{ 'course' | message }} {{ exercise.course }}
Test
{{ 'ml-heading-exercise' | message }} {{ focusmode.exercise.exerciseName }}
{{ 'ml-btn-previous-exercise' | message }} arrow_back {{ 'ml-btn-next-exercise' | message }} arrow_forward