{{ item.displayTitle }}

No history yet!

Student

Teacher

{{ item.displayTitle }}

{{ item.subject.displayTitle }}

{{ searchError }}

{{ courseTrack.displayTitle }} {{ statistics.percent }}% Sign in to view progress

{{ printedBook.courseTrack.name }} {{ printedBook.name }}
Data values can be distributed in many different ways — they can, for instance, be centered around the mean, or be spread far from it. Using measures of spread, the spread of a data set can be quantified by a value, where a larger one implies a bigger spread. However, there is no one measure of spread that is best in all scenarios.

A measure of spread is a way of quantifying how spread out, or different, the points in a data set are. A small spread means data points are similar, while a large spread means they are different. This is illustrated by the two data sets below. Both have a mean, median and mode of $3,$ but, we can assume the second data set has a larger spread because of how different its data points are.

Image not found. We apologize, please report this so that we can fix it as soon as possible!File = mljsx_concept_measure_of_spread_1_copy1.svg, id = concept_measure_of_spread_1_copy1

Image not found. We apologize, please report this so that we can fix it as soon as possible!File = mljsx_concept_measure_of_spread_2_copy1.svg, id = concept_measure_of_spread_2_copy1

Some commonly used measures of spread are range, mean absolute deviation, standard deviation, and interquartile range. These are often used together with a measure of center, to give an idea both of what a typical value is and how much the data can be expected to deviate from it.

One way to measure the spread of a data set is to find its range. This is done by calculating the difference between the maximum and minimum value of the data set.

$range=maximum value−minimum value$

Standard deviation is a commonly used measure of spread. It is a measure of how much a randomly selected value from a data set is expected to differ from the mean. To denote the standard deviation, the Greek letter $σ$ is used, which is read as "sigma."

The standard deviation, $σ,$ of a data set is calculated using the rule $σ=n(x_{1}−xˉ)_{2}+(x_{2}−xˉ)_{2}+…+(x_{n}−xˉ)_{2} ,$ where $n$ is the number of values in the data set and $xˉ$ is the mean of the set. Performing this calculation in one step makes for a convoluted expression. Therefore, it is best divided into a few, smaller steps. Consider the following data set as an example. $1,5,3,8,3,12$

Find the deviation of each data value, $x−xˉ$

For each data value, $x−xˉ$ can now be calculated and added to a table. This shows how much each data point varies from the mean.

$x$ | $x−xˉ$ |
---|---|

$1$ | $1−4=-3$ |

$5$ | $5−4=1$ |

$3$ | $3−4=-1$ |

$8$ | $8−4=4$ |

$3$ | $3−4=-1$ |

$12$ | $12−4=8$ |

Square the deviations

Square the deviations, and add them to a new column in the table.

$x$ | $x−xˉ$ | $(x−xˉ)_{2}$ |
---|---|---|

$1$ | $-3$ | $(-3)_{2}=9$ |

$5$ | $1$ | $1_{2}=1$ |

$3$ | $-1$ | $(-1)_{2}=1$ |

$8$ | $4$ | $4_{2}=16$ |

$3$ | $-1$ | $(-1)_{2}=1$ |

$12$ | $8$ | $8_{2}=64$ |

Find the mean of the squared deviations

The squared deviations should be added and divided by the number of data values. In other words, the mean of the squared deviations is found.

$69+1+1+16+1+64 $

AddTermsAdd terms

$692 $

CalcQuotCalculate quotient

$15.33333…$

RoundDecRound to ${\textstyle 2 \, \ifnumequal{2}{1}{\text{decimal}}{\text{decimals}}}$

$15.33$

This value is called the *variance* of the data set.

Square-root the mean of the squared deviations

One way to measure the spread of a data set is with the interquartile range, which is the difference between the third and the first quartile. The quartiles are found by dividing the data set into four equal-sized groups. A set of $12$ data values would be divided into groups with three data values in each.

The quartiles are the three values that divide the data set into four groups. They are denoted with $Q_{1},Q_{2}$ and $Q_{3}.$ Notice that $Q_{2}$ is the median.

The interquartile range, or IQR, is calculated by subtracting the first quartile, $Q_{1},$ from the third, $Q_{3}.$

$Interquartile range=Q_{3}−Q_{1}$

The interquartile range, IQR, is found by first identifying the three quartiles and then calculating the difference between the third and the first quartile.
Consider the following data set.
$1,3,4,4,5,6,6,8,8,10,10,11$
### 1

First, identify the median of the data set. Since the number of values is even, the median is the mean of the two middle values. $1,3,4,4, 5,6,6 ,8,8,10,10,11 $

### 2

The median divides the data into two sets, a lower set and an upper set. For this data, the lower set is the first six values and the upper set is the following six.
When there are an odd number of values in the data set, the middle value is excluded from both the lower and upper sets.
### 3

Find the first and the third quartile. The first quartile, $Q_{1},$ is the median of the lower set, while the third, $Q_{3},$ is the median of the upper set. Here, both quartiles are found the same way the median was found.
### 4

The interquartile range is calculated by subtracting the first quartile, $Q_{1},$ from the third, $Q_{3}.$ For the example, this gives $IQR=Q_{3}−Q_{1}=9−4=5.$

Identify the median

Identify the lower and the upper half of the data set

$\begin{aligned}
&\quad \quad \text{lower set} \quad \quad \quad \text{upper set}\\
&\overbrace{1, \; 3,\; 4,\; 4,\; 5,\; 6} \Big{|} \overbrace{6, \; 8,\; 8,\; 10,\; 10,\; 11} \end{aligned}$

Find the first and the third quartile

$\begin{aligned}
& \quad \; \; Q_1=4 \qquad \qquad Q_3=9 \\
& 1,\; 3,\; \overbrace{4,\; 4},\; 5,\; 6 \, \Big{|} \, 6,\; 8,\; \overbrace{8,\; 10},\; 10,\; 11 \end{aligned}$

Calculate the interquartile range

Each of these three measures of spread, range, standard deviation, and interquartile range, have advantages and drawbacks.

Calculating the range is a relatively simple process. However, since it only takes into account two data points, the variability of the entire data set is not known.

Because the standard deviation is found using all data points, it is representative of the entire data set. A drawback, though, is that for a large data set it is a very time-consuming calculation to do by hand.

{{ 'mldesktop-placeholder-grade' | message }} {{ article.displayTitle }}!

{{ exercise.headTitle }}

{{ 'ml-heading-exercise' | message }} {{ focusmode.exercise.exerciseName }}