Sign In
| 24 Theory slides |
| 13 Exercises - Grade E - A |
| Each lesson is meant to take 1-2 classroom sessions |
Here is a recommended readings before getting started with this lesson.
Emily and Ignacio love learning about animals. They believe they can make meaningful discoveries by studying data about any animal, beginning with cats. They choose to create a data set, consisting of seven data points, showing the lifespan of cats in their neighborhood. They surveyed their neighbors to get this information.
Lifespan of Cats (in years) | |||
---|---|---|---|
15 | 11 | 14 | 15 |
14 | 17 | 13 |
Answer the following questions using this data set.
A data set is a collection of values that provides information. These values can be presented in various ways such as in numbers or categories. The values are typically gathered through measurements, surveys, or experiments. Consider a data set that consists of the heights of a group of actors.
Actor | Height |
---|---|
Madzia | 5 ft 4 in. |
Magda | 5 ft 2 in. |
Ignacio | 6 ft 1.6 in. |
Henrik | 5 ft 10 in. |
Ali | 6 ft 1 in. |
Diego | 5 ft 2 in. |
Miłosz | 5 ft 2 in. |
Paulina | 5 ft 3 in. |
Aybuke | 5 ft 7 in. |
Mateusz | 6 ft 1.2 in. |
Gamze | 5 ft 3 in. |
Marcin | 5 ft 7 in. |
Marcial | 5 ft 8 in. |
Heichi | 5 ft 5 in. |
Arkadiusz | 5 ft 6 in. |
Enrique | 5 ft 10.5 in. |
Aleksandra | 5 ft 4 in. |
Mateusz | 5 ft 9 in. |
Jordan | 5 ft 5 in. |
Paula | 5 ft 2 in. |
MacKenzie | 5 ft 6 in. |
Joe | 6 ft 1 in. |
Flavio | 5 ft 10 in. |
Jeremy | 5 ft 4 in. |
Umut | 6 ft 1 in. |
The mean, or the average, of a numerical data set is one of the measures of center. It is defined as the sum of all of the data values in a set divided by the number of values in the set.
Mean=Number of ValuesSum of Values
The following applet calculates the mean of the data set on the number line. Points can be moved to change the data values.
Ignacio volunteers at a dog shelter. He asks Emily to help him study a data set he made concerning the lifespan of some of the dogs. The information they gather will help the shelter!
This time, the data set consists of eight data points rather than seven.
Lifespan of Dogs (in years) | |||
---|---|---|---|
10 | 21 | 16 | 15 |
13 | 15 | 17 | 11 |
Substitute values
Add terms
Calculate quotient
Similar to the measures of center, there are measures that describe how much the values in a data set differ from each other using only one measure. These measures summarize the spread of the data.
Range is a measure of spread that measures the difference between the maximum and minimum values of the data set.
The interquartile range, or IQR, of a data set is a measure of spread that measures the difference between Q3 and Q1, the upper and lower quartiles.
IQR=Q3−Q1
The following applet shows how to find the IQR of different data sets.
First, identify the median of the given data set. Since the number of values is even, the median is the mean of the two middle values.
The median of the data is 6.
The median divides the data into two halves, a lower half and an upper half. For this data, the lower half includes the first six values and the upper half includes the following six.
When there is an odd number of values in the data set, the middle value is excluded from both the lower and upper sets.
Find the first and the third quartile. The first quartile, Q1, is the median of the lower set, while the third, Q3, is the median of the upper set. Here, both quartiles are found the same way the median was found.
Here, it is necessary to order the values from least to greatest. Then identify the median of the given data set. Since the number of values is an odd number, the median is the middle value.
The median of the data is 9. Both the lower and upper halves contain four data values. Therefore, there are two middle values in each half. The median of each half is the mean of the two middle values.
In this case, the data values are ordered from least to greatest and the number of values is an even number. This means that the median is the mean of the two middle values.
The median of the data is 33.5. Both the lower and upper halves contain five data values. Therefore, there is only one middle value in each half.
A five-number summary of a data set consists of the following five values.
These values provide a summary of the central tendency and spread of the data set. The five-number summary is useful for understanding the variability in a data set. When the data set is written in numerical order, the median divides the data set into two halves. The median of the lower half is the first quartile Q1 and the median of the upper half is the third quartile Q3.
An outlier is a data point that is significantly different from the other values in the data set. It can be significantly larger or significantly smaller than the others.
Categorical data sometimes also have unusual elements; these can be called outliers as well.
Significantly DifferentMean?
For numerical data, the following definition is one of the several approaches that can be used.
Such a value was suggested by the esteemed American mathematician John Tukey. Move the slider in the following applet to see which data point is an outlier.
After excluding the outlier, the number of values decreased by one. There are nine values now, so the median is the middle value.
The median of the data is 32. Both the lower and upper halves contain four data values. Therefore, there are two middle values in each half. The median of each half is the mean of the two middle values.
Range | IQR | |
---|---|---|
With Outliers | 68 | 15 |
Without Outliers | 44 | 17 |
After removing the outlier from the data, the range decreased from 68 to 44, while the IQR increased from 15 to 17. This example shows that outliers have a bigger impact on the range of values than on the IQR.
Measures of spread, such as the range and interquartile range, indicate how much data values varies, while outliers are values that significantly deviate from the rest. Practice calculating these measures for the given data.
The mean absolute deviation (MAD) is a measure of the spread of a data set that measures how much the data elements differ from the mean. The mean absolute deviation is the average distance between each data value and the mean.
Calculating the MAD involves determining the absolute difference between every data point and the mean, followed by averaging these absolute differences. The applet below calculates the mean absolute deviation for the data set on the number line. Move the points around to change the data.Substitute values
Add terms
Calculate quotient
Next, calculate the absolute value of the differences between each data value and the mean.
Data Value | Absolute Value of Difference |
---|---|
82 | ∣82−84∣=2 |
85 | ∣85−84∣=1 |
90 | ∣90−84∣=6 |
75 | ∣75−84∣=9 |
95 | ∣95−84∣=11 |
85 | ∣85−84∣=1 |
90 | ∣90−84∣=6 |
70 | ∣70−84∣=14 |
Start by finding the mean. Then, calculate the distances between the mean and each data value. Finally, find the mean of these distances.
Begin by recalling what is the mean absolute deviation.
Mean Absolute Deviation |
An average of how much data values differ from the mean. |
To find the mean absolute deviation, these steps can be followed.
Data Value | Absolute Value of Difference |
---|---|
9 | ∣9−13∣=4 |
9 | ∣9−13∣=4 |
12 | ∣12−13∣=1 |
15 | ∣15−13∣=2 |
14 | ∣14−13∣=1 |
11 | ∣11−13∣=2 |
14 | ∣14−13∣=1 |
15 | ∣15−13∣=2 |
15 | ∣15−13∣=2 |
10 | ∣10−13∣=3 |
16 | ∣16−13∣=3 |
16 | ∣16−13∣=3 |
Add terms
Calculate quotient
Round to 1 decimal place(s)
sigma— is commonly used to denote the standard deviation. In a given set of data, most of the values fall within one standard deviation of the mean.
Standard deviation shows the variation of data from the mean.
First find the mean of the given data set. Then, find the range of the values that are within one standard deviation from the mean.
The values that are less than 18.8 are 16 and 18, and the values that are greater than 27.2 are 28 and 30. That means the heights outside the range of one standard deviation from the mean are 16, 18, 28, and 30 inches.
In this lesson, the measures of center and measures of spread were discussed.
Measures of Center | Measures of Spread |
---|---|
Mean Mode Median |
Range Interquartile Range Mean Absolute Deviation Standard Deviation |
The table shows the monthly salaries of employees at a company.
Monthly Salaries ($) | ||||
---|---|---|---|---|
4500 | 5200 | 4800 | 4600 | 5000 |
4700 | 4900 | 4400 | 4600 | 4300 |
We want to find the mean, median, and mode of the monthly salaries at a company.
Monthly Salaries ($) | ||||
---|---|---|---|---|
4500 | 5200 | 4800 | 4600 | 5000 |
4700 | 4900 | 4400 | 4600 | 4300 |
Let's remember what the mean is.
Mean |- The mean of the data set is the sum of the data divided by the number of data values.
Let's add all values and divide the sum by 10 because there are 10 values.
Next, we can find the median. Let's remember what the median is.
Median |- For a set with an odd number of values, the median is the middle value. For a set with an even number of values, the median is the mean of the two middle values.
There are ten values, which is an even number. Let's write the numbers in order from least to greatest. The median of the data set is the mean of the middle values.
The median is $4650. Let's remember what the mode is.
Mode |- The mode of a data set is the value or values that occur most often.
Let's take a look at our data set.
We can see that 4600 occurs twice and the rest of the numbers occurs only once. This means that the mode is $4600.
We know that each employee receives a $500 raise. Let's find the new salary of each employee.
Original Salary | 500 Raise | Salary After Raise |
---|---|---|
4500 | 4500+500 | 5000 |
5200 | 5200+500 | 5700 |
4800 | 4800+500 | 5300 |
4600 | 4600+500 | 5100 |
5000 | 5000+500 | 5500 |
4700 | 4700+500 | 5200 |
4900 | 4900+500 | 5400 |
4400 | 4400+500 | 4900 |
4600 | 4600+500 | 5100 |
4300 | 4300+500 | 4800 |
Let's find the mean of the new data set. We will add all new salaries and divide it by 10.
Next, we will order the numbers from least to greatest. The median of the data is the mean of two middle values.
The median is 5150. Let's find the mode.
We can see that 5100 occurs twice and the rest of the numbers occurs only once. This means that the mode is $5100.
Let's make a table to compare our results.
Mean | Median | Mode | |
---|---|---|---|
Original Salaries | 4700 | 4650 | 4600 |
Salaries After Raise | 5200 | 5150 | 5100 |
We can see that the mean, median, and mode increase. Let's find how much.
Mean | Median | Mode | |
---|---|---|---|
Original Salaries | 4700 | 4650 | 4600 |
Salaries After Raise | 5200 | 5150 | 5100 |
Difference | 5200-4700=500 | 5150-4650=500 | 5100-4600=500 |
The new mean, median, and mode are 500 more than the original mean, median, and mode. This means that the mean, median, and mode increased by $500. The answer is C.
The table shows Dominika's cell phone data usage across six months.
Cell Phone Data Usage (Gigabytes) | |||||
---|---|---|---|---|---|
3.8 | 4.1 | 3.9 | 4.2 | 6.3 | 3.2 |
We are given a data set consisting of six observations. Dominika's Data Usage (GB) 3.8, 4.1, 3.9, 4.2, 6.3, 3.2 Let's check if we can identify any outliers. An outlier is a data value that is much greater or much less than the other values. We can claim that there is one observation which is significantly greater than the rest of the values. Ordered Data Set 3.2, 3.8, 3.9, 4.1, 4.2, 6.3 Let's verify if it differs significantly from the other values. We first need to find the interquartile range of the data because Outliers are more than 1.5 times the interquartile range away from the upper and lower quartiles.
Outliers | |
---|---|
Outlier < Q_1-1.5IQR | Q_3+1.5IQR < Outlier |
The interquartile range is the difference of the third quartile and the first quartile.
The first quartile, the median of the lower half, is 3.8 GB. The third quartile, the median of the upper half, is 4.2 GB. The interquartile range is, therefore, 0.4 GB. Let's find out if the data value 6.3 is an outlier.
Q_1-1.5IQR | Q_3+1.5IQR | |
---|---|---|
Substitute Values | 3.8-1.5* 0.4 | 4.2+1.5* 0.4 |
Calculate | 3.2 | 4.8 |
This table tells us that the values less than 3.2 GB and greater than 4.8 GB are outliers. In our data set, there are no values below 3.2 GB, but there is only one value above 4.8 GB, which is 6.3 GB. Therefore, 6.3 GB is an outlier. Statement IV is false. Now, we will find the mean, range, and interquartile range of the data with and without outliers.
We will calculate two means, one considering all of the given values, and the other excluding the outlier. Recall that the mean of a data set is the sum of the data values divided by the number of data values.
Data Values | Mean |
---|---|
3.8, 4.1, 3.9, 4.2, 6.3, 3.2_6 | 3.8+4.1+3.9+4.2+ 6.3+3.2/6=25.5/6=4.25 |
3.8, 4.1, 3.9, 4.2, 3.2_5 | 3.8+4.1+3.9+4.2+3.2/5=19.2/5=3.84 |
The mean with the outlier is 4.25 GB, and the mean without the outlier is 3.84 GB. So, the outlier increases the value of the mean. When we exclude the outlier, the mean decreases from 4.25 GB to 3.84 GB. Therefore, Statement I is true.
We subtract the least value from the greatest to find the range.
Data Values | Range |
---|---|
3.8, 4.1, 3.9, 4.2, 6.3, 3.2 | 6.3 - 3.2 = 3.1 |
3.8, 4.1, 3.9, 4.2, 3.2 | 4.2- 3.2 = 1 |
The range with the outlier is 3.1 GB, and the range without the outlier is 1 GB. When we exclude the outlier, the range decreases from 3.1 GB to 1 GB. Statement II is false.
Finally, we will find the interquartile range of the data without the outlier.
As shown, the first quartile is 3.5 GB while the third quartile is 4.15 GB. The interquartile range is, therefore, 0.65 GB.
Data Values | Interquartile Range |
---|---|
3.2, 3.8, 3.9, 4.1, 4.2, 6.3 | 0.4 |
3.2, 3.8, 3.9, 4.1, 4.2 | 0.65 |
The interquartile range increases from 0.4 GB to 0.65 GB after we exclude the outlier. Therefore, Statement III is true. All in all, only I and III are true.
The data represents the monthly temperature in degrees Celsius in a city over the course of a year.
Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
---|---|---|---|---|---|---|---|---|---|---|---|
5∘C | 7∘C | 12∘C | 18∘C | 24∘C | 25∘C | 30∘C | 27∘C | 26∘C | 15∘C | 9∘C | 6∘C |
We are given a data set representing the monthly temperature in a city.
Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
---|---|---|---|---|---|---|---|---|---|---|---|
5^(∘)C | 7^(∘)C | 12^(∘)C | 18^(∘)C | 24^(∘)C | 25^(∘)C | 30^(∘)C | 27^(∘)C | 26^(∘)C | 15^(∘)C | 9^(∘)C | 6^(∘)C |
We want to find the mean absolute deviation (MAD) of the data. First, we will determine the mean, then find the distance between each data point and the mean, and finally find the average of these distances. Let's start by calculating the mean of the temperatures.
We found that the mean is 17^(∘)C. Next we make a table to find the sum of the absolute values of the difference between the mean and each data value.
Value | Difference Between Each Value and Mean | Absolute Value of Difference |
---|---|---|
5 | 5- 17=- 12 | |- 12|=12 |
7 | 7- 17=- 10 | |- 10|=10 |
12 | 12- 17=- 5 | |- 5|=5 |
18 | 18- 17= 1 | |1|=1 |
24 | 24- 17= 7 | |7|=7 |
25 | 25- 17=8 | |8|=8 |
30 | 30- 17=13 | |13|=13 |
27 | 27- 17=10 | |10|=10 |
26 | 26- 17= 9 | |9|=9 |
15 | 15- 17=- 2 | |- 2|=2 |
9 | 9- 17= - 8 | |- 8|=8 |
6 | 6- 17=- 11 | |- 11|=11 |
Finally, we add the absolute values found in the table and then divide the sum by the number of values, 12.
The mean absolute deviation of the data is 8^(∘)C. This number indicates that the data, on average, are 8^(∘)C away from the mean of 17^(∘)C.
Let's first determine which data values are within one mean absolute deviation of the mean. In Part A, we found the mean and absolute value deviation (MAD).
Mean & MAD 17 & 8
We add one MAD to the mean and subtract one MAD from the mean.
Mean - MAD | Mean + MAD |
---|---|
17 - 8 = 9 | 17 + 8 = 25 |
The data values that are between 9 and 25 are within one mean absolute value of the mean. 5, 6, 7, 9, 12, 15, 18, 24, 25, 26, 27, 30 0.8cm ↓ 9 2.65cm ↓ 25 1.3cm We see that the values 9, 12, 15, 18, 24, and 25 are between 9 and 25. There are 6 data values, so 612 of the values are one MAD away from the mean. 6/12 = 1/2 = 0.5 In other words, half of the values are one MAD away from the mean.
The data set represents the monthly electricity consumption (in kilowatt-hours) of a residential building for a 6-month period.
Jan | Feb | Mar | Apr | May | Jun |
---|---|---|---|---|---|
560 kWh | 530 kWh | 510 kWh | 470 kWh | 450 kWh | 300 kWh |
We want to determine which of the data values are within two standard deviations from the mean. Monthly Electricity Consumption 560, 530, 510, 470, 450, 300 We first need to calculate the mean of the data set. We add all values and divide the sum by the number of the values.
The mean of the data set is 470 kWh. Now we can find the range of values that are within two standard deviations from the mean! We know that the standard deviation of the data set is approximately 84 kWh, so we add 2* 84 to the mean and subtract 2* 84 from the mean.
Mean - 2* Standard Deviation | Mean + 2* Standard Deviation |
---|---|
470 - 2* 84 = 302 | 470 + 2* 84 = 638 |
The data values that are between 302 and 638 are within two standard deviations from the mean. 300, 450, 470, 510, 530, 560 ↓ 302 3.5cm ↓ 638 All values except 300 are in this range. Therefore, the values within the range of two standard deviations from the mean are 450, 470, 510, 530, and 560.
The table shows the lengths (in centimeters) of the fish Jordan caught.
Length of Fish (cm) | |||||
---|---|---|---|---|---|
21 | 22 | 25 | 22 | 48 | 24 |
We are given a table showing the lengths of the fish Jordan caught. We want to find the mean absolute deviation for the data set.
Length of Fish (cm) | |||||
---|---|---|---|---|---|
21 | 22 | 25 | 22 | 48 | 24 |
We can follow these three steps to find the mean absolute deviation.
Let's start by finding the mean of the lengths.
We found that the mean is 27cm. We can move on to finding the absolute values of the differences between the values and the mean.
Value | Difference Between Each Value and Mean | Absolute Value of Difference |
---|---|---|
21 | 21- 27=- 6 | |- 6|=6 |
22 | 22- 27=-5 | |- 5|=5 |
25 | 25- 27=- 3 | |-2|=2 |
22 | 22- 27=- 5 | |- 5|=5 |
48 | 48- 27= 21 | |21|=21 |
24 | 24- 27=- 3 | |- 3|=3 |
Finally, we add the absolute values found in the table and then divide the sum by the number of values.
The mean absolute deviation of the data is 7cm. This number indicates that the data, on average, are 7cm away from the mean of 27cm.
We will repeat the process, this time excluding 48 from the data set. Notice that this value is an outlier as it is significantly different from the rest of the values.
21, 22, 25, 22, 48 , 24
Let's find the average of these 5 values.
We found that the mean is 22.8cm. Next we will find the absolute values of the differences between the values and the mean.
Value | Difference Between Each Value and Mean | Absolute Value of Difference |
---|---|---|
21 | 21- 22.8=- 1.8 | |- 1.8|=1.8 |
22 | 22- 22.8=- 0.8 | |- 0.8|=0.8 |
25 | 25- 22.8=2.2 | |2.2|=2.2 |
22 | 22- 22.8=- 0.8 | |- 0.8|=0.8 |
24 | 24- 22.8=1.2 | |1.2|= 1.2 |
Finally, we add the absolute values found in the table and then divide the sum by the number of values.
The mean absolute deviation of the data without 48 is 1.36cm. This number indicates that the data without the outlier, on average, are 1.36cm away from the mean of 22.8cm.
When we calculate the MAD of the data with and without outliers, we can see that including 48 in the data set significantly increases the mean absolute deviation. MAD Without Outlier: 1.36 MAD With Outlier: 7 This means that without 48, the data values are much closer together than all values of the data set. The value 48 lies far from the rest of the data values. This why this value causes such a big increase of the mean absolute deviation.