A measure of spread is a way of quantifying how spread out, or different, the points in a data set are. A small spread means data points are similar, while a large spread means they are different. This is illustrated by the two data sets below. Both have a mean, median and mode of but, we can assume the second data set has a larger spread because of how different its data points are.
One way to measure the spread of a data set is to find its range. This is done by calculating the difference between the maximum and minimum value of the data set.
Standard deviation is a commonly used measure of spread. It is a measure of how much a randomly selected value from a data set is expected to differ from the mean. To denote the standard deviation, the Greek letter is used, which is read as "sigma."
The standard deviation, of a data set is calculated using the rule where is the number of values in the data set and is the mean of the set. Performing this calculation in one step makes for a convoluted expression. Therefore, it is best divided into a few, smaller steps. Consider the following data set as an example.
First, the mean, should be calculated. The example data set has values, so the denominator is
For each data value, can now be calculated and added to a table. This shows how much each data point varies from the mean.
Square the deviations, and add them to a new column in the table.
The squared deviations should be added and divided by the number of data values. In other words, the mean of the squared deviations is found.
This value is called the variance of the data set.
Finally, take the square root of the just found quotient to get the standard deviation. Here, the fraction is used instead of the quotient, to avoid rounding errors. Thus, a randomly chosen value from this data set is expected to deviate roughly units from the mean.
One way to measure the spread of a data set is with the interquartile range, which is the difference between the third and the first quartile. The quartiles are found by dividing the data set into four equal-sized groups. A set of data values would be divided into groups with three data values in each.
The quartiles are the three values that divide the data set into four groups. They are denoted with and Notice that is the median.
The interquartile range, or IQR, is calculated by subtracting the first quartile, from the third,
The interquartile range is calculated by subtracting the first quartile, from the third, For the example, this gives
Calculating the range is a relatively simple process. However, since it only takes into account two data points, the variability of the entire data set is not known.
Because the standard deviation is found using all data points, it is representative of the entire data set. A drawback, though, is that for a large data set it is a very time-consuming calculation to do by hand.