Pearson Algebra 1 Common Core, 2011
PA
Pearson Algebra 1 Common Core, 2011 View details
3. Measures of Central Tendency and Dispersion
Continue to next subchapter

Exercise 4 Page 742

What measures of central tendency do we use to describe data sets that do or do not have outliers?

See solution.

Practice makes perfect

Measures of central tendency are used to summarize a data set by finding a central data value. The given measures — mean, median, and mode — are all measures of central tendency. We will analyze how they describe the data one at a time.

Mean

Let's start with introducing two example data sets.

The second data set contains the value 73, which is noticeably greater than the other values. Therefore, we can call it an outlier. We will now calculate the mean of each data set. Recall that the mean is the sum of the data values divided by the number of data values. Mean = Sum of Data Values/Number of Data Values Let's start by calculating the mean ÎĽ_1 of the first data set. Note that it consists of 7 data values.
Mean = Sum of Data Values/Number of Data Values
ÎĽ_1 = 21 + 15 + 30 + 24 + 5 + 7 + 24/7
ÎĽ_1 = 126/7
ÎĽ_1 = 18
Similarly, we will calculate the mean ÎĽ_2 of the second data set. This time we have 9 data values.
Mean = Sum of Data Values/Number of Data Values
ÎĽ_2 = 5 + 7 + 31 + 21 + 24 + 7 + 15 + 73 + 24/9
ÎĽ_2 = 207/9
ÎĽ_2 = 23
Note that μ_2 is significantly greater than μ_1. The mean — also called the average — describes the middle of a set of data by using all of the data values.

Median

Now let's find the medians of both data sets. To do so, we need to arrange the values in order.

Since both data sets are ordered and contain an odd number of data values, it is straightforward to choose the median. Recall that the median is the middlemost value when the values are put in increasing order.

It appears that 21 is the median of both data sets. The median also describes the middle of a set of data. However, unlike the mean, it does not use all of the data values, but only the position of central values in an ordered data set.

Mode

Next, we will find the mode of each data set. Recall that the mode is the data value that is the most common in the data set. Let's take a look at the data sets and determine the data values that occur more than once.

The mode of the first data set is 24 because this is the only data value that occurs twice. On the other hand, there are two data values in the second data set that occur twice, 24 and 7. Therefore, the second data set has two modes.

Why All Measures Are Needed?

All three measures that we have calculated are measures of central tendency. We want to determine why we need all of them. Let's consider the values that we found.

Data Set 1 Data Set 2
Mean 18 23
Median 21 21
Mode 24 24 and 7

Since the mean uses all of the values in a data set, it is vulnerable to outliers. In our case, 73 influenced the value of the mean for the second data set. Please see the end of the solution for further explanation. Mean With Outlier:& 23 Mean Without Outlier:& 16.75 Since outliers influence the mean significantly, we should use the mean for data sets that do not have outliers. On the other hand, the median is the central value in an ordered data set and is not affected by the smallest or greatest values.

Therefore, we should use the median for data sets that have outliers because they will not influence the result. Next, the mode is used to find the most popular item in a data set. This item may not be even close to the middle value. Mean of Data Set 2:& 23 Mode of Data Set 2:& 7and24 Note that one of the modes is significantly smaller that the middle value of the second data set. Finally, we can conclude why three different measures are needed.

  • The mean is used for data sets without outliers.
  • The median is used for data sets with outliers.
  • The mode is used when we want to choose the most popular data value instead of the middlemost value.

Extra

Mean Without Outlier
Let's calculate the mean of the second data set without the outlier, 73. In this case, there are 8 data values in total.
Mean = Sum of Data Values/Number of Data Values
Mean = 5 + 7 + 31 + 21 + 24 + 7 + 15 + 24/8
Mean = 134/8
Mean = 16.75
We can see that the mean of the data set without the outlier is smaller than the mean when it was included Mean With Outlier:& 23 Mean Without Outlier:& 16.75 Therefore, the outlier significantly influences the value of the mean.