A scatter plot is a graph that relates numerical data with two parameters in a coordinate plane. The data points are plotted as ordered pairs. For instance, the number of ice cream cones sold daily at a kiosk can be plotted against the temperature that day.
When there is a statistical connection between two parameters of data, such that a change in one is associated with a change in the other, they are said to be correlated. For instance, up until approximately age 18, there is a correlation between age and height: older people are generally taller, and taller people are generally older.
Causation is a relationship between two correlated quantities where one directly affects the other.
If two quantities correlate in such a way that an increase of one quantity is associated with an increase in the other, they are said to be positively correlated. Likewise, an increase in one quantity associated with a decrease in the other, is called a negative correlation.
The more the data points appear to follow a specific trend, the more correlated they are. If they are situated almost exactly on a line, the quantities are said to be strongly correlated, while if they are more spread out, the quantities are weakly correlated.
How strongly two quantities correlate can be described using the correlation coefficient, r. It can take on values between -1 and 1. Values near -1 means that the correlation is strong and negative, while a strong, positive correlation leads to a value close to 1. If there is no correlation, it has a value of 0.
When data sets have a positive or negative correlation, the trend of the data can be modeled using a line of fit. A line of fit, or trend line, is a line drawn through the points of a scatter plot. The points should be evenly distributed above and below the line. For example, a line of fit can be drawn through the scatter plot that shows the mean kitten's weight according to their age.
Some things need to be considered when drawing a line of fit.
The music-and-animal amusement park "Hiphop-opotamus Park" has 20 hippos that they regularly measure. The length and height of each hippo at the time of the last measurement has been plotted below.
Draw a line of fit to the data. Then, comment on the apparent association in the data. Lastly, estimate the equation of the line and interpret its slope and y-intercept in context.
Looking at the graph, we can see that an increase in height is associated with an increase in length. Thus, the line of fit has a positive slope. Therefore, there is a positive correlation between a hippo's height and length. Note that height does not directly cause length, nor does length directly cause height. Thus, we can argue that there is no causation between these quantities, only a correlation.
When a line of fit has been drawn on a scatter plot, it is possible to determine how well the line models the data. This can be done by analyzing the residuals. A residual is the difference between a data point and the line of fit.
Most graphing calculators have a function called linear regression which can be used to find a precise line of fit, using strict rules. This line of fit is then called the line of best fit or the regression line.
The first step is to enter the data points in the calculator. On a TI calculator, this is done by pressing the button, and then selecting the option
This gives a number of columns, marked L1, L2, L3, and so on.
With the help of the arrow keys, choose where in the lists to fill in the data values. Enter the data points' x-values in list L1, and the corresponding y1values in list L2. Values are entered into the fields using the number buttons followed by
After entering the values, press the button, then select the menu item CALC to the right.
LinReg(ax+b) gives a line of best fit, expressed as a linear function in slope-intercept form.
DiagnosticOnand enable it by pressing twice. Then, find the line of best fit again.