PA
Pre-Algebra View details
5. Statistical Measures
Continue to next lesson
Lesson
Exercises
Tests
Chapter 12
5. 

Statistical Measures

Statistical measures are vital for understanding and analyzing data in various contexts. They provide tools to summarize and interpret information, helping identify patterns, trends, and anomalies. Measures such as the mean, median, and mode offer insights into the central tendencies of a dataset, while the range, quartiles, and standard deviation reveal the variability and spread of values. These tools are essential for making informed decisions, evaluating performance, or assessing risks in fields like education, science, and business. By applying statistical measures, it becomes possible to derive meaningful conclusions and present data more effectively, enhancing comprehension and communication.
Show more expand_more
Problem Solving Reasoning and Communication Error Analysis Modeling Using Tools Precision Pattern Recognition
Lesson Settings & Tools
24 Theory slides
13 Exercises - Grade E - A
Each lesson is meant to take 1-2 classroom sessions
Statistical Measures
Slide of 24
Analyzing and comparing data is as important as collecting it. This lesson covers basic data analysis concepts. It starts with finding a central value and then moves on to measuring how spread out the data points are. A good understanding of statistical measures will be achieved through this lesson.

Catch-Up and Review

Here is a recommended readings before getting started with this lesson.

Challenge

Study a Data Set About The Lifespan of Cats

Emily and Ignacio love learning about animals. They believe they can make meaningful discoveries by studying data about any animal, beginning with cats. They choose to create a data set, consisting of seven data points, showing the lifespan of cats in their neighborhood. They surveyed their neighbors to get this information.

Lifespan of Cats (in years)
15 11 14 15
14 17 13

Answer the following questions using this data set.

a What is the average lifespan of a cat?
b Which number, if any, occurs most frequently?
c Rearrange the data from least to greatest. What number is in the middle of this sorted data set?
Discussion

What is a Data Set?

A data set is a collection of values that provides information. These values can be presented in various ways such as in numbers or categories. The values are typically gathered through measurements, surveys, or experiments. Consider a data set that consists of the heights of a group of actors.

Actor Height
Madzia 5ft 4in.
Magda 5ft 2in.
Ignacio 6ft 1.6in.
Henrik 5ft 10in.
Ali 6ft 1in.
Diego 5ft 2in.
Miłosz 5ft 2in.
Paulina 5ft 3in.
Aybuke 5ft 7in.
Mateusz 6ft 1.2in.
Gamze 5ft 3in.
Marcin 5ft 7in.
Marcial 5ft 8in.
Heichi 5ft 5in.
Arkadiusz 5ft 6in.
Enrique 5ft 10.5in.
Aleksandra 5ft 4in.
Ashli 5ft 4in.
Jordan 5ft 5in.
Paula 5ft 2in.
MacKenzie 5ft 6in.
Joe 6ft 1in.
Flavio 5ft 10in.
Jeremy 5ft 4in.
Umut 6ft 1in.

A single value in a data set, such as an individual actor's height, is called an observation or data point. In this table, each observation corresponds to the height of an actor — meaning that there are 24 observations. Each observation contains two variables, the actor's name and height. Number of Observations:& 24 Number of Variables: & 2 The actual number or category associated with each data point is called a data value. Data values are the specific pieces of information contained within a data point. Data sets can be represented using charts, tables, or different types of graphs. For example, the average temperature of a city for each month of 2018 can be plotted on a line graph.

This lesson will focus on analyzing data sets resulting from the observation of a single variable.
Discussion

What is the Average of a Data Set?

The mean, or the average, of a numerical data set is one of the measures of center. It is defined as the sum of all of the data values in a set divided by the number of values in the set.


Mean=Sum of Values/Number of Values

The following applet calculates the mean of the data set on the number line. Points can be moved to change the data values.

Discussion

What is the Median of a Data Set?

The median is a measure of center that lies in the middle of a numerical data set when the data set is written in numerical order. When the the data set has an odd number of data points, the median is the value in the middle.
Random data set with 9 elements, the median is marked
However, when the the data set has an even number of data points, the median is the average of the two middle numbers.
Random data set with 10 elements, the median is marked
Discussion

What is the Mode of a Data Set?

The mode is a measure of center that shows the most common value in a data set. Modes can be used for both numerical and categorical data.
Random data set with 11 elements, the mode is marked
A data set can have more than one mode if two or more data values are equally common. However, if all values in the set only occur once, then the data set does not have a mode.
Discussion

Summarizing a Data Set With a Single Number

A measure of center, or a measure of central tendency, is a statistic that summarizes a data set by finding a central value. The most common measures of center are the mean, median, and mode.
Interactive applet where points of the dot plot can be moved around.
Move the points around in the dot plot to generate new data. The applet identifies the mean, median, and mode of the data set.
Example

Studying Data about the Lifespan of Dogs

Ignacio volunteers at a dog shelter. He asks Emily to help him study a data set he made concerning the lifespan of some of the dogs. The information they gather will help the shelter! Dogs.svg This time, the data set consists of eight data points rather than seven.

Lifespan of Dogs (in years)
10 21 16 15
13 15 17 11
a What is the mean of the data set?
b What is the median of the data set?
c What is the mode of the data set?

Hint

a The mean of the data set is the sum of the data values divided by the number of data values.
b Order the data from least to greatest. What number is in the middle?
c The mode of a data set is the value that occurs most frequently.

Solution

a The mean of a data set is calculated by finding the sum of all values in the set and then dividing by the number of values in the set. In this case, there are 8 values in the set.
10,21,16,15,13,15,17,11 Add all values and divide the sum by 8.
Mean = Sum of Values/Number of Values
Mean =10+21+16+15+13+15+17+11/8
Mean = 118/8
Mean = 14.75
The average lifespan of the dogs studied in the survey is 14.75 years.
b Start by ordering the values from least to greatest.

Unordered Data Set 10,21,16,15,13,15,17,11 ⇓ Ordered Data Set 10,11,13,15,15,16,17,21 The number of values matters when determining the median.

  • For a set with an odd number of values, the median is the middle value.
  • For a set with an even number of values, the median is the mean of the two middle values.
In this case, the data consists of an even number of values. The values in the middle are 15 and 15. Ordered Data Set 10,11,13, 15, 15,16,17,21 Therefore, the median of the data set is the mean of 15 and 15.
15 + 15/2
30/2
15
The median of this data set is 15 years.
c Remember that the mode of a data set is the value or values that occur most often. Take another look at the given data set.

Ordered Data Set 10,11,13, 15, 15,16,17,21 As seen, 15 occurs two times and the rest of the numbers occurs only once. This means that the mode of the data set is 15 years. Note that while the mean, median, and mode are close in this instance, they may vary in other cases.

Pop Quiz

Practice Finding Measures of the Center

Measures such as the mean, median, and mode are essential for understanding the central tendency of a data set. Find the indicated measure of the given data set. If the answer is not an integer number, round it to one decimal place.

Discussion

Measures of Spread of Data Sets

Similar to the measures of center, there are measures that describe how much the values in a data set differ from each other using only one measure. These measures summarize the spread of the data.

Concept

Range

Range is a measure of spread that measures the difference between the maximum and minimum values of the data set.

Random data sets with maximum, minimum highlighted and the range calculated.
Discussion

Quartiles

Quartiles are three values that divide a data set into four equal parts. The quartiles are denoted as Q_1, Q_2, and Q_3. The second quartile Q_2, also known as the median, divides the ordered data set into two halves. ccc & Q_2& & ↑ & Lower half && Upper half a b c & d&e f g The median of the lower half is the first quartile Q_1, while the median of the upper half is the third quartile Q_3. ccc & Q_2& & ↑ & Lower half && Upper half a b c & d&e f g ↓ && ↓ Q_1&& Q_3 The first quartile is also called lower quartile, and the third quartile is also called upper quartile. To find the quartiles of a data set, the values must first be written in numerical order.

Example of how three quartiles can be identified in a set
Discussion

Interquartile Range

The interquartile range, or IQR, of a data set is a measure of spread that measures the difference between Q_3 and Q_1, the upper and lower quartiles.


IQR=Q_3-Q_1

The following applet shows how to find the IQR of different data sets.

Applet that calculates the interquartile range of a data set
Discussion

Finding the Interquartile Range

The interquartile range (IQR) of a data set is found by first identifying the three quartiles and then calculating the difference between the third and the first quartile. Consider the following data set. 1, 3, 4, 4, 5, 6, 6, 8, 8, 10, 10, 11 The interquartile range of the data set can be found by following these four steps.
1
Identify the Median
expand_more

First, identify the median of the given data set. Since the number of values is even, the median is the mean of the two middle values.

The median of the data is 6.

2
Identify the Lower and the Upper Half of the Data Set
expand_more

The median divides the data into two halves, a lower half and an upper half. For this data, the lower half includes the first six values and the upper half includes the following six.

When there is an odd number of values in the data set, the middle value is excluded from both the lower and upper sets.

3
Find the First and the Third Quartile
expand_more

Find the first and the third quartile. The first quartile, Q_1, is the median of the lower set, while the third, Q_3, is the median of the upper set. Here, both quartiles are found the same way the median was found.

4
Calculate the Interquartile Range
expand_more

The interquartile range is calculated by subtracting the first quartile, Q_1, from the third, Q_3. For the given data set, the first quartile is 4 and the third quartile is 9. IQR & = Q_3- Q_1 & = 9- 4 & =5 The interquartile range of the given data set is 5.

Example

Comparing the Weights of Cats and Dogs

Ignacio and Emily enjoyed learning about cats and dog so much that they now want to compare the spread of one data set with the spread of another. CatvsDogs.png They collected a few more data points and compiled a data set consisting of nine data points for the weights of cats. Weights of Cats (lb) 9, 8, 11, 7, 10, 7, 11, 8, 12 Then, they collected a data set of ten data points for the weights of dogs. Weights of Dogs (lb) 11, 18, 29, 32, 32, 35, 37, 44, 55, 79

a Which type of pet has a larger weight range: dogs or cats?
b Find the interquartile range of each data set.

Hint

a The range of the data set is the difference between the greatest and least data values.
b The interquartile range is the distance between the first and the third quartiles of the data set.

Solution

a The range is one of the measures of spread. It is the difference between the maximum and minimum values of the data set. The range of each data set will be calculated individually.

Range for the Weights of Cats

The least and greatest values can be identified without sorting the data values. Note that they can be listed in order if desired. Weights of Cats (lb) 9, 8, 11, 7, 10, 7, 11, 8, 12 The least value is 7 and the greatest value is 12. The difference between these values is 12- 7 = 5. Range 12- 7 = 5 lb The range for the weights of cats is 5 pounds.

Range for the Weights of Dogs

Apply the same procedure of identifying the greatest and least values for the data set of dogs. Weights of Dogs (lb) 11, 18, 29, 32, 32, 35, 37, 44, 55, 79 The least value is 11 and the greatest value is 59. The difference between these values is 79- 11 = 68. Range 79- 11 = 68 lb Dogs have a weight range of 68 pounds. This far exceeds the 5-pound range for cats.

b The interquartile range of each data set will be calculated individually.

Interquartile Range of Cat Weights

Here, it is necessary to order the values from least to greatest. Then identify the median of the given data set. Since the number of values is an odd number, the median is the middle value.

The median of the data is 9. Both the lower and upper halves contain four data values. Therefore, there are two middle values in each half. The median of each half is the mean of the two middle values.

The first quartile is 7.5, and the third quartile is 11. The difference between the third quartile and the first quartile is the interquartile range.
IQR= Q_3-Q_1
IQR= 11- 7.5
IQR= 3.5
The interquartile range of cat weights is 3.5 pounds.

Interquartile Range of Dog Weights

In this case, the data values are ordered from least to greatest and the number of values is an even number. This means that the median is the mean of the two middle values.

The median of the data is 33.5. Both the lower and upper halves contain five data values. Therefore, there is only one middle value in each half.

The first quartile is 29, and the third quartile is 44. The difference between the third quartile and the first quartile is the interquartile range.
IQR= Q_3-Q_1
IQR= 44- 29
IQR= 15
The interquartile range of dog weights is 15 pounds.
Discussion

Five-Number Summary

A five-number summary of a data set consists of the following five values.

  1. Minimum value
  2. First quartile Q_1
  3. Median, or second quartile Q_2
  4. Third quartile Q_3
  5. Maximum value

These values provide a summary of the central tendency and spread of the data set. The five-number summary is useful for understanding the variability in a data set. When the data set is written in numerical order, the median divides the data set into two halves. The median of the lower half is the first quartile Q_1 and the median of the upper half is the third quartile Q_3.

Five-number summary is applied to different data sets
Discussion

Outliers

An outlier is a data point that is significantly different from the other values in the data set. It can be significantly larger or significantly smaller than the others.

Several human figures in line where one is much higher than the others.

Categorical data sometimes also have unusual elements; these can be called outliers as well.

Several human figures in line where one is of different color.
However, it is best to use the term outlier when applying a mathematical process to identify it. This approach helps differentiate between an intuitive and a more formal approach.

What Does Significantly Different Mean?

For numerical data, the following definition is one of the several approaches that can be used.

  • A data value is an outlier — significantly different from the other values — if it is farther away from the closest quartile than 1.5 times the interquartile range.

Such a value was suggested by the esteemed American mathematician John Tukey. Move the slider in the following applet to see which data point is an outlier.

Identifying outliers
Example

An Unusual Value in the Data

Ignacio is relaxing, enjoying reviewing some data. Ignacio-Reading.png Wait a minute! There is something unusual about a data value in the data set for dogs. Weights of Dogs (lb) 11, 18, 29, 32, 32, 35, 37, 44, 55, 79

a Identify the outlier in the data set.
b Find the range and interquartile range of the data set without the outlier.
c Which measure does the outlier affect more?

Hint

a Is there a value that is larger or smaller than most values? Check if there is any value less than Q_1-1.5IQR or greater than Q_3+1.5IQR.
b The range of a data set is the difference between the greatest and smallest data values. The interquartile range of a data set is the distance between the first and the third quartiles of the data set.
c Compare the range and interquartile range of the data set with and without the outlier.

Solution

a In the given data set, all values seem to be around the same number, except 79. This value seems to be significantly different from other values. Therefore, it is likely to be an outlier of the data set.

Weights of Dogs (lb) 11, 18, 29, 32, 32, 35, 37, 44, 55, 79 To confirm that, check if it is farther away from the closest quartile by 1.5 times the interquartile range. The first quartile of this data is 24, and the third quartile is 34.

The interquartile range of this data set is 15. Now calculate Q_3+1.5IQR.
Q_3+1.5IQR
44 + 1.5 * 15
44+22.5
66.5
This means that any value greater than 66.5 is an outlier. Therefore, the value 79 is an outlier.
b Exclude the outlier found in Part A from the data set.

Weights of Dogs Without Outlier 11, 18, 29, 32, 32, 35, 37, 44, 55

Finding Range

To find its range, subtract the smallest value from the greatest. Range 55-11 = 44

Finding Interquartile Range

After excluding the outlier, the number of values decreased by one. There are nine values now, so the median is the middle value.

The median of the data is 32. Both the lower and upper halves contain four data values. Therefore, there are two middle values in each half. The median of each half is the mean of the two middle values.

The first quartile is 23.5, and the third quartile is 40.5. The difference between the third quartile and the first quartile is the interquartile range.
IQR= Q_3-Q_1
IQR= 40.5- 23.5
IQR= 17
The interquartile range of the data when the outlier is taken out of the data set is 17.
c Consider the data once again, with and without outliers.

Weights of Dogs 11, 18, 29, 32, 32, 35, 37, 44, 55, 79 Weights of Dogs Without Outlier 11, 18, 29, 32, 32, 35, 37, 44, 55 Summarize the results found in the previous parts.

Range IQR
With Outliers 68 15
Without Outliers 44 17

After removing the outlier from the data, the range decreased from 68 to 44, while the IQR increased from 15 to 17. This example shows that outliers have a bigger impact on the range of values than on the IQR.

Pop Quiz

Practice Finding Measures of Spread

Measures of spread, such as the range and interquartile range, indicate how much data values varies, while outliers are values that significantly deviate from the rest. Practice calculating these measures for the given data.

Discussion

Mean Absolute Deviation

The mean absolute deviation (MAD) is a measure of the spread of a data set that measures how much the data elements differ from the mean. The mean absolute deviation is the average distance between each data value and the mean.

Calculating the MAD involves determining the absolute difference between every data point and the mean, followed by averaging these absolute differences. The applet below calculates the mean absolute deviation for the data set on the number line. Move the points around to change the data.
Applet to calculate the mean absolute deviation
A large MAD value indicates that data points deviate considerably from the mean — that is, there is significant variation within the data set.
Discussion

Finding the Mean Absolute Deviation

The mean absolute deviation is a measure that describes the average absolute difference between the data points in a data set and the mean of the data set. It is calculated by finding the absolute difference between each data point and the mean, then taking the average of those absolute differences. Consider for example the following data set. 82, 85, 90, 75, 95, 85, 90, 70 The values are the scores of 8 students on a math test. The mean absolute deviation of the data set can be found by following these three steps.
1
Calculate the Mean
expand_more
The mean of a data set is the sum of all values in the set divided by the number of values.
Mean = Sum of Values/Number of Values
Mean = 82+85+90+75+95+85+90+70/8
Mean =672/8
Mean = 84
The mean of the data is 84.
2
Calculate the Distance Between Each Data Point and the Mean
expand_more

Next, calculate the absolute value of the differences between each data value and the mean.

Data Value Absolute Value of Difference
82 |82- 84| = 2
85 |85- 84| = 1
90 |90- 84| = 6
75 |75- 84| = 9
95 |95- 84| = 11
85 |85- 84| = 1
90 |90- 84| = 6
70 |70- 84| = 14
3
Calculate the Average of the Distances Found in Step 2
expand_more
Find the average of the absolute values of the differences between each data value and the mean.
2+1+6+9+11+1+6+14/8
50/8
6.25
The mean absolute deviation for the given data set is 6.25. This means that the average distance each data value is from the mean is 6.25 points. In other words, on average, the students' test scores deviate from the mean of 84 by 6.25 points.
Example

Studying the Mean Absolute Deviation of Cat Heights

Ignacio and Emily are researching the variation in the heights of cats from the mean. Cats-heights.jpg They are most interested in calculating the mean absolute deviation of the cat's heights to better understand how the sizes of the cats vary. Height of Cats (in.) 9, 9, 12, 15, 14, 11, 14, 15, 15, 10, 16, 16 Find the mean absolute deviation of the cat's heights. Round the answer to one decimal place.

Hint

Start by finding the mean. Then, calculate the distances between the mean and each data value. Finally, find the mean of these distances.

Solution

Begin by recalling what is the mean absolute deviation.

Mean Absolute Deviation

An average of how much data values differ from the mean.

To find the mean absolute deviation, these steps can be followed.

  1. Find the mean of the data
  2. Find the distance between each data value and the mean
  3. Find the average of the distances found in Step 2.
Start by calculating the mean. The given data set consists of the heights of 12 cats, measured in inches. 9, 9, 12, 15, 14, 11, 14, 15, 15, 10, 16, 16_(12) The mean is a sum of all values divided by the number of them.
9+9+ 12+ 15+ 14+ 11+ 14+ 15+ 15+ 10+ 16+ 16/12
156/12
13
The mean of the data set is 13 inches. Now, move on to finding the distances between the data values and the mean.
Data Value Absolute Value of Difference
9 |9- 13| = 4
9 |9- 13| = 4
12 |12- 13| = 1
15 |15- 13| = 2
14 |14- 13| = 1
11 |11- 13| = 2
14 |14- 13| = 1
15 |15- 13| = 2
15 |15- 13| = 2
10 |10- 13| = 3
16 |16- 13| = 3
16 |16- 13| = 3
Finally, add the values found in the table and then divide the sum by the number of values, 12.
4 + 4 + 1 + 2 + 1 + 2 + 1 + 2 + 2 + 3 + 3 + 3/12
28/12
2.333333 ...
≈ 2.3
The mean absolute deviation is about 2.3 inches. This is the average distance of each data value from the mean. On average, the heights of cats deviate from the mean of 13 inches by about 2.3 inches.
Discussion

Standard Deviation

The standard deviation is a measure of spread of a data set that measures how much the data values differ from the mean. The Greek letter σ — read as sigma — is commonly used to denote the standard deviation. In a given set of data, most of the values fall within one standard deviation of the mean. Mean ± Standard Deviation For example, if the mean of a data set is 15 and the standard deviation is 4, then most of the values fall between 11 and 19, as 15 - 4 = 11 and 15 + 4= 19.

Standard deviation shows the variation of data from the mean.

  • If the standard deviation is small, it means the values in the data set are close to the mean.
  • If the standard deviation is large, it means the values are spread out over a wider range.
Calculating standard deviation is not the focus of this lesson. However, the following applet illustrates how to calculate the standard deviation for five data points.
Applet that calculates the standard deviation of a set of five numbers
As illustrated, the standard deviation is the square root of the average of the squared differences between each value in the data set and the mean of the data set.
Example

Data Values Beyond One Standard Deviation

Finally, Ignacio and Emily are examining the variation in the heights of dogs. Dogs-heights.jpg Here are the values they are analyzing. Height of Dogs (in.) 24, 18, 22, 20, 26, 30, 28, 16, 25, 21 They found that the standard deviation of the heights is 4.2 inches. Which heights are not within one standard deviation from the mean?

Hint

First find the mean of the given data set. Then, find the range of the values that are within one standard deviation from the mean.

Solution

Start by calculating the mean. The given data set consists of the heights of 10 dogs, measured in inches. 24, 18, 22, 20, 26, 30, 28, 16, 25, 21_(10) The mean is the sum of all values divided by the total number of values.
24+18+ 22+ 20+ 26+ 30+ 28+ 16+25+ 21/10
230/10
23
The mean of the data set is 23 inches. Now, find the range of values that are within one standard deviation of the mean! To do that, subtract one standard deviation from the mean and add one standard deviation to the mean — record both numbers. Mean - Standard Deviation = 23 - 4.2= 18.8 Mean + Standard Deviation = 23 + 4.2 = 27.2 The data values that are between 18.8 and 27.2 inches are within one standard deviation of the mean.

The values that are less than 18.8 are 16 and 18, and the values that are greater than 27.2 are 28 and 30. That means the heights outside the range of one standard deviation from the mean are 16, 18, 28, and 30 inches.

Closure

Finding The Measures of Center of a Data Set

In this lesson, the measures of center and measures of spread were discussed.

Measures of Center Measures of Spread
Mean
Mode
Median
Range
Interquartile Range
Mean Absolute Deviation
Standard Deviation
Considering the definitions of each concept covered, the challenge presented at the beginning of the lesson is more doable. The challenge is to find the measures of center for the lifespan of cats. Lifespan of Cats (years) 15,11,14,15,14,17,13 Remember that the mean of a data set is calculated by finding the sum of all values in the set and then dividing by the number of values in the set. Move the points to calculate the mean of the data set.
The average lifespan of cats is about 14.1 years. To find the mode and median of the data set, the values are ordered from least to greatest.
Since the data set has an odd number of values, the median is the middle value, 14 years. Furthermore, this data set has two modes, 14 and 15 years, as both values appear twice in the data set.
Statistical Measures
Exercise 2.1
>
2
e
7
8
9
×
÷1
=
=
4
5
6
+
<
log
ln
log
1
2
3
()
sin
cos
tan
0
.
π
x
y