Sign In
What measures of central tendency do we use to describe data sets that do or do not have outliers?
See solution.
Measures of central tendency are used to summarize a data set by finding a central data value. The given measures — mean, median, and mode — are all measures of central tendency. We will analyze how they describe the data one at a time.
Let's start with introducing two example data sets.
Substitute values
Add terms
Calculate quotient
Substitute values
Add terms
Calculate quotient
Now let's find the medians of both data sets. To do so, we need to arrange the values in order.
Since both data sets are ordered and contain an odd number of data values, it is straightforward to choose the median. Recall that the median is the middlemost value when the values are put in increasing order.
It appears that 21 is the median of both data sets. The median also describes the middle of a set of data. However, unlike the mean, it does not use all of the data values, but only the position of central values in an ordered data set.
Next, we will find the mode of each data set. Recall that the mode is the data value that is the most common in the data set. Let's take a look at the data sets and determine the data values that occur more than once.
The mode of the first data set is 24 because this is the only data value that occurs twice. On the other hand, there are two data values in the second data set that occur twice, 24 and 7. Therefore, the second data set has two modes.
All three measures that we have calculated are measures of central tendency. We want to determine why we need all of them. Let's consider the values that we found.
Data Set 1 | Data Set 2 | |
---|---|---|
Mean | 18 | 23 |
Median | 21 | 21 |
Mode | 24 | 24 and 7 |
Since the mean uses all of the values in a data set, it is vulnerable to outliers. In our case, 73 influenced the value of the mean for the second data set. Please see the end of the solution for further explanation. Mean With Outlier:& 23 Mean Without Outlier:& 16.75 Since outliers influence the mean significantly, we should use the mean for data sets that do not have outliers. On the other hand, the median is the central value in an ordered data set and is not affected by the smallest or greatest values.
Therefore, we should use the median for data sets that have outliers because they will not influence the result. Next, the mode is used to find the most popular item in a data set. This item may not be even close to the middle value. Mean of Data Set 2:& 23 Mode of Data Set 2:& 7and24 Note that one of the modes is significantly smaller that the middle value of the second data set. Finally, we can conclude why three different measures are needed.
Substitute values
Add terms
Use a calculator