menu_book {{ printedBook.name}}

arrow_left {{ state.menu.current.label }}

arrow_left {{ state.menu.current.current.label }}

arrow_left {{ state.menu.current.current.current.label }}

eCourses /

{{ result.displayTitle }} *navigate_next*

{{ result.subject.displayTitle }}

{{ 'math-wiki-no-results' | message }}

{{ 'math-wiki-keyword-three-characters' | message }}

{{ r.avatar.letter }}

{{ r.name }} {{ r.lastMessage.message.replace('TTREPLYTT','') }} *people*{{keys(r.currentState.members).length}} *schedule*{{r.lastMessage.eventTime}}

{{ r.getUnreadNotificationCount('total') }} +

{{ u.avatar.letter }}

{{ u.displayName }} (you) {{ r.lastMessage.message.replace('TTREPLYTT','') }} *people*{{keys(r.currentState.members).length}} *schedule*{{r.lastMessage.eventTime}}

{{ r.getUnreadNotificationCount('total') }} +

Sometimes two different data sets can be collected from the same source. Graphing these data sets in a scatter plot and fitting a mathematical model to the data can be a helpful analysis tool.

A scatter plot is a graph that shows each observation of a bivariate data set as an ordered pair in a coordinate plane. Consider the following example in which a scatter plot shows the results obtained at a local ice cream parlor of a study that recorded the number of ice creams sold and the corresponding air temperature.

Among other insights, the graph shows that when the temperature was about $100_{∘}F,$ approximately 4000 ice creams were sold. Additionally, as the temperature increased, the number of sales increased. In this case, it can be said that there is a positive correlation between the data sets — number of ice creams sold and air temperature.When there is a statistical connection between two parameters of data, such that a change in one is associated with a change in the other, they are said to be correlated. For instance, up until approximately age 18, there is a correlation between age and height: older people are generally taller, and taller people are generally older.

Causation is a relationship between two correlated quantities where one **directly** affects the other.

**Causation exists**: An example of a correlation where there is also a causation is height and age. Aging directly causes growth, up until some point.

**No causation**: In winter, both the number of house fires and car accidents increase — they are correlated. However, house fires do not cause car accidents. There is a potential common factor that can explain the increase in both: winter, which causes both slippery road conditions and more candles to be lit, which leads to more fires. Here, there is a correlation, but no causation.

If two quantities correlate in such a way that an increase of one quantity is associated with an increase in the other, they are said to be *positively correlated*. Likewise, an increase in one quantity associated with a decrease in the other, is called a *negative correlation*.

The more the data points appear to follow a specific trend, the more correlated they are. If they are situated almost exactly on a line, the quantities are said to be *strongly correlated*, while if they are more spread out, the quantities are *weakly correlated*.

How strongly two quantities correlate can be described using the correlation coefficient, r. It can take on values between -1 and 1. Values near -1 means that the correlation is strong and negative, while a strong, positive correlation leads to a value close to 1. If there is no correlation, it has a value of 0.

When data sets have a positive or negative correlation, the trend of the data can be modeled using a line of fit, also called a trend line. This line is drawn on a scatter plot near most of the data points, which appear evenly distributed above and below the line.

The scatter plot above shows the mean weights of kittens from the same litter in relation to their age. In this case, a line of fit could be drawn quite seamlessly. When drawing such a line of fit, the following characteristics should be considered.

- The data needs to have either a positive or negative correlation.
- While a line of fit is not unique and does not create an exact distribution, ideally, about half of the points should be above the line and about half below the line.
- An equation of the line can be found using two of its points. These points do not necessarily belong to the bivariate data set.

The music-and-animal amusement park "Hiphop-opotamus Park" has 20 hippos that they regularly measure. The length and height of each hippo at the time of the last measurement has been plotted below.

Draw a line of fit to the data. Then, comment on the apparent association in the data. Lastly, estimate the equation of the line and interpret its slope and y-intercept in context.

Show Solution *expand_more*

Looking at the graph, we can see that an increase in height is associated with an increase in length. Thus, the line of fit has a positive slope. Therefore, there is a positive correlation between a hippo's height and length. Note that height does not directly cause length, nor does length directly cause height. Thus, we can argue that there is no causation between these quantities, only a correlation.

$m=x_{2}−x_{1}y_{2}−y_{1} $

SubstitutePoints

Substitute $(1,5)$ & $(4.5,10)$

$m=4.5−110−5 $

SubTerms

Subtract terms

$m=3.55 $

CalcQuot

Calculate quotient

$m≈1.43$

y=1.43x+3.57.

$m=11.43 andb=3.57.$

Since x represents the height of a hippo and y represents the length, the slope tells us that for every 1 foot in height the hippo has an additional 1.43 feet in length. Similarly, the y-intercept means that a hippo whose height measures 0 feet will be 3.57 feet long. Notice that this last statement doesn't make sense. This is because the point (0,3.57) lies

When a line of fit has been drawn on a scatter plot, it is possible to determine how well the line models the data. This can be done by analyzing the residuals. A residual is the difference between a data point and the line of fit.

Generally, the smaller the absolute values of the residuals, the more reliable the line of fit is. If the residuals are graphed in a scatter plot, a random or non-uniform distribution indicates a reliable line of fit. Likewise, if some kind of pattern appears in the scatter plot, the line is probably not a good fit for the data.Most graphing calculators have a function called linear regression, which can be used to find a precise line of fit using strict rules. This line of fit is then called the line of best fit. For example, consider the following data set.

x | y |
---|---|

1 | 33.12 |

2 | 24.4 |

3 | 16.6 |

4 | 9.3 |

5 | 3.9 |

The line of best fit can be calculated following these 3 steps.

1

Enter Values

On a graphing calculator, begin by entering the data points. To do so, press the $STAT $ button and select the option Edit.

This gives a number of columns, labeled L1, L2, L3, and so on.

Use the arrow keys to choose where in the lists to fill in the data values. Enter the x-values of the data points in L1 and press $ENTER $ after each value. The same can be done for the the corresponding y-values in column L2.

2

Fitting the Line

After entering the values, press the $STAT $ button and select the menu item Calc.

The option LinReg(ax+b)

gives the line of best fit, expressed as a linear function in slope-intercept form. Press the $ENTER $ button until the parameters are given.

In this case, the line of best fit is described by the function y=-7.354x+39.506. The correlation coefficient r is less than -0.99, indicating a strong negative correlation. If r does not appear, press $CATALOG ,$ $2ND ,$ 0, then select DiagnosticOn

and enable it by pressing $ENTER $ twice. Once more, find the line of best fit.

3

Graphing the Line of Best Fit

A graphing calculator can also be used to graph the line of best fit. After selecting the option LinReg(ax+b),

choose the option Store RegEQ.

Press $VARS $ and move to the **Y-VARS** menu. Then, select the option FUNCTION

and press $ENTER .$

Press the $ENTER $ button until the parameters are given. To graph the scatter plot, first push the buttons $2nd $ and $Y= .$ Then, choose one of the plots in the list. Select the option ON,

choose the type to be a scatter plot, and assign L1 and L2 as XList

and Ylist,

respectively.

Then, the plot can be made by pressing the button $GRAPH .$ It is possible that after drawing the plot the window-size is not large enough to see all of the information.

To fix this, press $ZOOM $ and select the option ZoomStat.

After doing that, the window will resize to show the important information.

{{ 'mldesktop-placeholder-grade' | message }} {{ article.displayTitle }}!

{{ focusmode.exercise.exerciseName }}

close

Community rate_review

{{ r.avatar.letter }}

{{ u.avatar.letter }}

+