Sign In
| 24 Theory slides |
| 13 Exercises - Grade E - A |
| Each lesson is meant to take 1-2 classroom sessions |
Here is a recommended readings before getting started with this lesson.
Emily and Ignacio love learning about animals. They believe they can make meaningful discoveries by studying data about any animal, beginning with cats. They choose to create a data set, consisting of seven data points, showing the lifespan of cats in their neighborhood. They surveyed their neighbors to get this information.
Lifespan of Cats (in years) | |||
---|---|---|---|
15 | 11 | 14 | 15 |
14 | 17 | 13 |
Answer the following questions using this data set.
A data set is a collection of values that provides information. These values can be presented in various ways such as in numbers or categories. The values are typically gathered through measurements, surveys, or experiments. Consider a data set that consists of the heights of a group of actors.
Actor | Height |
---|---|
Madzia | 5 ft 4 in. |
Magda | 5 ft 2 in. |
Ignacio | 6 ft 1.6 in. |
Henrik | 5 ft 10 in. |
Ali | 6 ft 1 in. |
Diego | 5 ft 2 in. |
Miłosz | 5 ft 2 in. |
Paulina | 5 ft 3 in. |
Aybuke | 5 ft 7 in. |
Mateusz | 6 ft 1.2 in. |
Gamze | 5 ft 3 in. |
Marcin | 5 ft 7 in. |
Marcial | 5 ft 8 in. |
Heichi | 5 ft 5 in. |
Arkadiusz | 5 ft 6 in. |
Enrique | 5 ft 10.5 in. |
Aleksandra | 5 ft 4 in. |
Mateusz | 5 ft 9 in. |
Jordan | 5 ft 5 in. |
Paula | 5 ft 2 in. |
MacKenzie | 5 ft 6 in. |
Joe | 6 ft 1 in. |
Flavio | 5 ft 10 in. |
Jeremy | 5 ft 4 in. |
Umut | 6 ft 1 in. |
The mean, or the average, of a numerical data set is one of the measures of center. It is defined as the sum of all of the data values in a set divided by the number of values in the set.
Mean=Number of ValuesSum of Values
The following applet calculates the mean of the data set on the number line. Points can be moved to change the data values.
Ignacio volunteers at a dog shelter. He asks Emily to help him study a data set he made concerning the lifespan of some of the dogs. The information they gather will help the shelter!
This time, the data set consists of eight data points rather than seven.
Lifespan of Dogs (in years) | |||
---|---|---|---|
10 | 21 | 16 | 15 |
13 | 15 | 17 | 11 |
Substitute values
Add terms
Calculate quotient
Similar to the measures of center, there are measures that describe how much the values in a data set differ from each other using only one measure. These measures summarize the spread of the data.
Range is a measure of spread that measures the difference between the maximum and minimum values of the data set.
The interquartile range, or IQR, of a data set is a measure of spread that measures the difference between Q3 and Q1, the upper and lower quartiles.
IQR=Q3−Q1
The following applet shows how to find the IQR of different data sets.
First, identify the median of the given data set. Since the number of values is even, the median is the mean of the two middle values.
The median of the data is 6.
The median divides the data into two halves, a lower half and an upper half. For this data, the lower half includes the first six values and the upper half includes the following six.
When there is an odd number of values in the data set, the middle value is excluded from both the lower and upper sets.
Find the first and the third quartile. The first quartile, Q1, is the median of the lower set, while the third, Q3, is the median of the upper set. Here, both quartiles are found the same way the median was found.
Here, it is necessary to order the values from least to greatest. Then identify the median of the given data set. Since the number of values is an odd number, the median is the middle value.
The median of the data is 9. Both the lower and upper halves contain four data values. Therefore, there are two middle values in each half. The median of each half is the mean of the two middle values.
In this case, the data values are ordered from least to greatest and the number of values is an even number. This means that the median is the mean of the two middle values.
The median of the data is 33.5. Both the lower and upper halves contain five data values. Therefore, there is only one middle value in each half.
A five-number summary of a data set consists of the following five values.
These values provide a summary of the central tendency and spread of the data set. The five-number summary is useful for understanding the variability in a data set. When the data set is written in numerical order, the median divides the data set into two halves. The median of the lower half is the first quartile Q1 and the median of the upper half is the third quartile Q3.
An outlier is a data point that is significantly different from the other values in the data set. It can be significantly larger or significantly smaller than the others.
Categorical data sometimes also have unusual elements; these can be called outliers as well.
Significantly DifferentMean?
For numerical data, the following definition is one of the several approaches that can be used.
Such a value was suggested by the esteemed American mathematician John Tukey. Move the slider in the following applet to see which data point is an outlier.
After excluding the outlier, the number of values decreased by one. There are nine values now, so the median is the middle value.
The median of the data is 32. Both the lower and upper halves contain four data values. Therefore, there are two middle values in each half. The median of each half is the mean of the two middle values.
Range | IQR | |
---|---|---|
With Outliers | 68 | 15 |
Without Outliers | 44 | 17 |
After removing the outlier from the data, the range decreased from 68 to 44, while the IQR increased from 15 to 17. This example shows that outliers have a bigger impact on the range of values than on the IQR.
Measures of spread, such as the range and interquartile range, indicate how much data values varies, while outliers are values that significantly deviate from the rest. Practice calculating these measures for the given data.
The mean absolute deviation (MAD) is a measure of the spread of a data set that measures how much the data elements differ from the mean. The mean absolute deviation is the average distance between each data value and the mean.
Calculating the MAD involves determining the absolute difference between every data point and the mean, followed by averaging these absolute differences. The applet below calculates the mean absolute deviation for the data set on the number line. Move the points around to change the data.Substitute values
Add terms
Calculate quotient
Next, calculate the absolute value of the differences between each data value and the mean.
Data Value | Absolute Value of Difference |
---|---|
82 | ∣82−84∣=2 |
85 | ∣85−84∣=1 |
90 | ∣90−84∣=6 |
75 | ∣75−84∣=9 |
95 | ∣95−84∣=11 |
85 | ∣85−84∣=1 |
90 | ∣90−84∣=6 |
70 | ∣70−84∣=14 |
Start by finding the mean. Then, calculate the distances between the mean and each data value. Finally, find the mean of these distances.
Begin by recalling what is the mean absolute deviation.
Mean Absolute Deviation |
An average of how much data values differ from the mean. |
To find the mean absolute deviation, these steps can be followed.
Data Value | Absolute Value of Difference |
---|---|
9 | ∣9−13∣=4 |
9 | ∣9−13∣=4 |
12 | ∣12−13∣=1 |
15 | ∣15−13∣=2 |
14 | ∣14−13∣=1 |
11 | ∣11−13∣=2 |
14 | ∣14−13∣=1 |
15 | ∣15−13∣=2 |
15 | ∣15−13∣=2 |
10 | ∣10−13∣=3 |
16 | ∣16−13∣=3 |
16 | ∣16−13∣=3 |
Add terms
Calculate quotient
Round to 1 decimal place(s)
sigma— is commonly used to denote the standard deviation. In a given set of data, most of the values fall within one standard deviation of the mean.
Standard deviation shows the variation of data from the mean.
First find the mean of the given data set. Then, find the range of the values that are within one standard deviation from the mean.
The values that are less than 18.8 are 16 and 18, and the values that are greater than 27.2 are 28 and 30. That means the heights outside the range of one standard deviation from the mean are 16, 18, 28, and 30 inches.
In this lesson, the measures of center and measures of spread were discussed.
Measures of Center | Measures of Spread |
---|---|
Mean Mode Median |
Range Interquartile Range Mean Absolute Deviation Standard Deviation |
Tearrik posts a photo daily throughout a week. The table displays the total likes received by these photos.
Number of Likes | |||
---|---|---|---|
25 | 30 | 28 | 36 |
40 | 36 | 42 | − |
We are given a table showing the number of likes Tearrik received for his photos. We want to find the average number of daily likes.
Number of Likes | |||
---|---|---|---|
25 | 30 | 28 | 36 |
40 | 36 | 42 | - |
Remember, the mean of a data set is calculated by finding the sum of all values in the set and then dividing by the number of values in the set. We need to add all of the values, and divide them by the number of observations. Here, we have data for 7 days, so we should divide the sum by 7.
Tearrik receives an average of about 34 likes for each photo.
We want to find the median of the given data set. The median is the value that divides the ordered set into two halves. The first thing we have to do is to rearrange the data values from least to greatest.
Given Data 25, 30, 28, 36, 40, 36,42 ⇓ Ordered Data 25, 28, 30, 36, 36, 40, 42
Since the number of values in our set is 7, the median is the fourth value in order. Therefore, the median is 36 likes.
The mode is the most common value in the set. Let's take a look at the ordered set one more time!
Ordered Data 25, 28, 30, 36, 36, 40, 42
We can see that the most common value in the given data set is 36. This means that the mode of our data set is 36 likes.
The table shows the scores of eight students on a mathematics exam.
Scores | |||
---|---|---|---|
85 | 72 | 90 | 68 |
78 | 85 | 91 | 79 |
We want to find the range of the given data set.
Scores | |||
---|---|---|---|
85 | 72 | 90 | 68 |
78 | 85 | 91 | 79 |
The range is the difference between the greatest and least values in a set of data. We can order the data from least to greatest to find the range. Given Data 85, 72, 90, 68, 78, 85, 91, 79 ⇓ Ordered Data 68, 72, 78, 79, 85, 85, 90, 91 For this exercise, the greatest value is 91 and the least value is 68. Range 91- 68= 23
We want to find the interquartile range of the data set.
68, 72, 78, 79, 85, 85, 90, 91
Since the interquartile range is the difference between the upper quartile and the lower quartile (Q_3-Q_1), we only need to find these two quartiles.
We have already sorted the data set from smallest value to largest value in Part A. 68, 72, 78, 79,^(Lower Half) | 85, 85, 90, 91^(Upper Half) The number of values in our data set is 8, so both the lower and upper halves contain four data values. We have two middle values for each half. Thus, we need to calculate the mean of those middle values. Lower Quartile:& 72+ 78/2= 75 [1.1em] Upper Quartile:& 85+ 90/2= 87.5 The last step to calculate the interquartile range is to calculate the difference between the upper and the lower quartile. Let's do it! Interquartile Range 87.5- 75=12.5
We want to find the mean, median, and mode of the given data set. 93, 75, 85, 83, 89, 93, 88, 33, 90 Let's begin by calculating the mean.
The mean of a data set is the sum of the values divided by the total number of values in the set. There are 9 values in our set, so we have to divide the sum by 9.
The mean is 81. We can continue by finding the median.
When the data are arranged in numerical order, the median is the middle value — or the mean of the two middle values — in a set of data. Let's arrange the given values and find the median. Given Data 93, 75, 85, 83, 89, 93, 88, 33, 90 ⇓ Ordered Data 33, 75, 83, 85, 88, 89, 90, 93, 93 Since the number of values in our set is 9, the median is the value in the middle, 88. The last measure we need is the mode. Let's find it!
The mode is the value or values that appear most often in a set of data. Arranging the data set from least to greatest makes it easier to see how often each value appears. Let's take a look at the ordered set one more time! Ordered Data 33, 75, 83, 85, 88, 89, 90, 93, 93 The mode is 93.
We have to calculate the interquartile range (IQR) to identify any outliers. Let's start by revisiting some information about quartiles.
Now we can identify the quartiles and the interquartile range of the data. We will look at the ordered data set!
The interquartile range of this data set is 12.5. Next, we need to determine the maximum and minimum values for data to be considered an outlier. Outliers are more than 1.5 times the IQR away from the upper and lower quartiles.
Let's start with the minimum value. We substitute 79 for Q_1 and 12.5 for IQR.
The only value less than 60.25 is 33. This means that 33 is an outlier. Now let's check if there are other outliers by calculating the maximum value. We substitute 91.5 for Q_3 and 12.5 for IQR.
Our data set does not contain values greater than 110.25. This means that there are no more outliers than 33.
Let's repeat the process, this time excluding 33 from the data set.
75, 83, 85, 88, 89, 90, 93, 93
Let's start by calculating the sum of the given values without 33 and divide the sum by 8.
The mean of the data without the outlier is 87.
Recall the ordered set with 33 excluded. 75, 83, 85, 88, | 89, 90, 93, 93 This time the number of values in our set is 8. This is why the median is the mean of the middle values. 88+89/2 = 88.5 The median is 88.5.
We can see that the most common value in the given data set is still 93. 75, 83, 85, 88, 89, 90, 93, 93 This is the mode of our data set.
Let's summarize our findings in the table below so it is easier to compare the results.
Data set | Mean | Median | Mode |
---|---|---|---|
With Outlier | 81 | 88 | 93 |
Without Outlier (33) | 87 | 88.5 | 93 |
We can see that removing the outlier changed the mean and the median. They both increased but the mean increased more than the median. Removing the outlier did not change the mode of the data. The answer is mean.
We want to find the range and the interquartile range of the given data. 28, 27, 33, 29, 25, 43, 29, 28, 23, 33
Let's start by finding the range. The range of a data set is one of the measures of variation. It is the difference of the greatest value and the least value. Let's order the data values from least to greatest. 28, 27, 33, 29, 25, 43, 29, 28, 23, 33 ⇓ 23, 25, 27, 28, 28, 29, 29, 33, 33, 43 The least value in the data set is 23. The greatest value in the data set is 43. Let's find the range by subtracting the least value from the greatest value. Range 43 - 23 = 20 The range of the data values is 20.
Next, we will find the interquartile range or IQR. It is the difference of the third quartile and the first quartile. Let's identify the quartiles.
We can see that Q_2 = 28.5. The first quartile, Q_1 is the median of the lower half of the data. The third quartile, Q_3 is the median of the upper half. This means that Q_1 = 27, and Q_3 = 33. Their difference is 6. Interquartile Range 33- 27=6 The interquartile range is 6.
In Part A, we found that the interquartile range of the given data is 6. We also found that the first quartile is Q_1 = 27 and the third quartile is Q_3 = 33. Now, we can identify the outliers. We know that any value less than Q_1 - 1.5(IQR) or greater than Q_3 + 1.5(IQR) is an outlier.
Let's find the exact values of the expressions.
Q_1-1.5IQR | Q_3+1.5IQR | |
---|---|---|
Substitute Values | 27-1.5* 6 | 33+1.5* 6 |
Calculate | 18 | 42 |
Therefore, the values less than 18 and greater than 42 are outliers. In our data set, there are no values below 18, but there is only one value above 42, namely 43. 23, 25, 27, 28, 28, 29, 29, 33, 33, 43 Therefore, 43 is an outlier.
Let's repeat the process, this time excluding 43 from the data set.
23, 25, 27, 28, 28, 29, 29, 33, 33
We subtract the least value from the greatest. Range Without Outlier 33-23 = 10 The range decreased to 10.
Next, we will find the IQR of the new data set. Let's divide the data set into the lower half and the upper half.
We can see that Q_2 = 28. The first quartile, Q_1 is the median of the lower half of the data. The third quartile, Q_3 is the median of the upper half. This means that Q_1 = 26, and Q_3 = 31. Their difference is 5. Interquartile Range 31- 26=5 The interquartile range without the outlier is 5.
Let's summarize our findings in a table below to more easily compare the results.
Data set | Range | IQR |
---|---|---|
With Outlier | 20 | 6 |
Without Outlier (43) | 10 | 5 |
In this example, removing the outlier reduced both the range and the interquartile range. That said, the range decreased more than the interquartile range. This suggests that outliers have a greater effect on the range than on the interquartile range.
The mean absolute deviation (MAD) is the average of the absolute values of the differences between the mean and each value in the data set. Let's see the given data set. 54, 46, 55, 59, 48, 50 We will start by calculating the mean of the given set of numbers. We can see that there are 6 values in our data set. Let's calculate the mean!
We found that the mean of the given data set is 52. We are ready to calculate the MAD. As previously stated, the MAD of a set of data is the average of the absolute values of the differences between the mean and each value in the data set. Let's use a table to find the sum of the absolute values of the differences.
Value | Difference Between Each Value and Mean | Absolute Value of Difference |
---|---|---|
54 | 54- 52=2 | |2|=2 |
46 | 46- 52=- 6 | |- 6|=6 |
55 | 55- 52=3 | |3|=3 |
59 | 59- 52= 7 | |7|=7 |
48 | 48- 52=- 4 | |- 4|=4 |
50 | 50- 52=-2 | |-2|=2 |
Finally, we add the absolute values found in the table and then divide the sum by the number of values 6.
The mean absolute deviation of the data is 4. This number indicates that the data, on average, are 4 units away from the mean of 52.
We are given a data set consisting of the number of pages read by 10 different students. 40, 42, 45, 48, 49, 50, 52, 66, 67, 74 We are asked to describe the number of pages that are within one standard deviation — 11 pages — of the mean. First, we need to find the mean. We add all values and divide the sum by the number of the values.
The mean of the data set is 53.5. Now we can find the range of values that are within one standard deviation of the mean! To do that, we need to add one standard deviation to the mean and subtract one standard deviation from the mean.
Mean - Standard Deviation | Mean + Standard Deviation |
---|---|
53.3 - 11 = 42.3 | 53.3 + 11 = 64.3 |
The data values that are between 42.3 and 64.3 are within one standard deviation of the mean. 40, 42, 45, 48, 49, 50, 52, 66, 67, 74 ↓ 42.3 2.35cm ↓ 64.3 1.3cm The values that are between 43.3 and 64.3 are 45, 48, 49, 50, and 52. Therefore, the values within the range of one standard deviation from the mean are 45, 48, 49, 50, and 52.