Sign In
| 24 Theory slides |
| 13 Exercises - Grade E - A |
| Each lesson is meant to take 1-2 classroom sessions |
Here is a recommended readings before getting started with this lesson.
Emily and Ignacio love learning about animals. They believe they can make meaningful discoveries by studying data about any animal, beginning with cats. They choose to create a data set, consisting of seven data points, showing the lifespan of cats in their neighborhood. They surveyed their neighbors to get this information.
Lifespan of Cats (in years) | |||
---|---|---|---|
15 | 11 | 14 | 15 |
14 | 17 | 13 |
Answer the following questions using this data set.
A data set is a collection of values that provides information. These values can be presented in various ways such as in numbers or categories. The values are typically gathered through measurements, surveys, or experiments. Consider a data set that consists of the heights of a group of actors.
Actor | Height |
---|---|
Madzia | 5 ft 4 in. |
Magda | 5 ft 2 in. |
Ignacio | 6 ft 1.6 in. |
Henrik | 5 ft 10 in. |
Ali | 6 ft 1 in. |
Diego | 5 ft 2 in. |
Miłosz | 5 ft 2 in. |
Paulina | 5 ft 3 in. |
Aybuke | 5 ft 7 in. |
Mateusz | 6 ft 1.2 in. |
Gamze | 5 ft 3 in. |
Marcin | 5 ft 7 in. |
Marcial | 5 ft 8 in. |
Heichi | 5 ft 5 in. |
Arkadiusz | 5 ft 6 in. |
Enrique | 5 ft 10.5 in. |
Aleksandra | 5 ft 4 in. |
Mateusz | 5 ft 9 in. |
Jordan | 5 ft 5 in. |
Paula | 5 ft 2 in. |
MacKenzie | 5 ft 6 in. |
Joe | 6 ft 1 in. |
Flavio | 5 ft 10 in. |
Jeremy | 5 ft 4 in. |
Umut | 6 ft 1 in. |
The mean, or the average, of a numerical data set is one of the measures of center. It is defined as the sum of all of the data values in a set divided by the number of values in the set.
Mean=Number of ValuesSum of Values
The following applet calculates the mean of the data set on the number line. Points can be moved to change the data values.
Ignacio volunteers at a dog shelter. He asks Emily to help him study a data set he made concerning the lifespan of some of the dogs. The information they gather will help the shelter!
This time, the data set consists of eight data points rather than seven.
Lifespan of Dogs (in years) | |||
---|---|---|---|
10 | 21 | 16 | 15 |
13 | 15 | 17 | 11 |
Substitute values
Add terms
Calculate quotient
Similar to the measures of center, there are measures that describe how much the values in a data set differ from each other using only one measure. These measures summarize the spread of the data.
Range is a measure of spread that measures the difference between the maximum and minimum values of the data set.
The interquartile range, or IQR, of a data set is a measure of spread that measures the difference between Q3 and Q1, the upper and lower quartiles.
IQR=Q3−Q1
The following applet shows how to find the IQR of different data sets.
First, identify the median of the given data set. Since the number of values is even, the median is the mean of the two middle values.
The median of the data is 6.
The median divides the data into two halves, a lower half and an upper half. For this data, the lower half includes the first six values and the upper half includes the following six.
When there is an odd number of values in the data set, the middle value is excluded from both the lower and upper sets.
Find the first and the third quartile. The first quartile, Q1, is the median of the lower set, while the third, Q3, is the median of the upper set. Here, both quartiles are found the same way the median was found.
Here, it is necessary to order the values from least to greatest. Then identify the median of the given data set. Since the number of values is an odd number, the median is the middle value.
The median of the data is 9. Both the lower and upper halves contain four data values. Therefore, there are two middle values in each half. The median of each half is the mean of the two middle values.
In this case, the data values are ordered from least to greatest and the number of values is an even number. This means that the median is the mean of the two middle values.
The median of the data is 33.5. Both the lower and upper halves contain five data values. Therefore, there is only one middle value in each half.
A five-number summary of a data set consists of the following five values.
These values provide a summary of the central tendency and spread of the data set. The five-number summary is useful for understanding the variability in a data set. When the data set is written in numerical order, the median divides the data set into two halves. The median of the lower half is the first quartile Q1 and the median of the upper half is the third quartile Q3.
An outlier is a data point that is significantly different from the other values in the data set. It can be significantly larger or significantly smaller than the others.
Categorical data sometimes also have unusual elements; these can be called outliers as well.
Significantly DifferentMean?
For numerical data, the following definition is one of the several approaches that can be used.
Such a value was suggested by the esteemed American mathematician John Tukey. Move the slider in the following applet to see which data point is an outlier.
After excluding the outlier, the number of values decreased by one. There are nine values now, so the median is the middle value.
The median of the data is 32. Both the lower and upper halves contain four data values. Therefore, there are two middle values in each half. The median of each half is the mean of the two middle values.
Range | IQR | |
---|---|---|
With Outliers | 68 | 15 |
Without Outliers | 44 | 17 |
After removing the outlier from the data, the range decreased from 68 to 44, while the IQR increased from 15 to 17. This example shows that outliers have a bigger impact on the range of values than on the IQR.
Measures of spread, such as the range and interquartile range, indicate how much data values varies, while outliers are values that significantly deviate from the rest. Practice calculating these measures for the given data.
The mean absolute deviation (MAD) is a measure of the spread of a data set that measures how much the data elements differ from the mean. The mean absolute deviation is the average distance between each data value and the mean.
Calculating the MAD involves determining the absolute difference between every data point and the mean, followed by averaging these absolute differences. The applet below calculates the mean absolute deviation for the data set on the number line. Move the points around to change the data.Substitute values
Add terms
Calculate quotient
Next, calculate the absolute value of the differences between each data value and the mean.
Data Value | Absolute Value of Difference |
---|---|
82 | ∣82−84∣=2 |
85 | ∣85−84∣=1 |
90 | ∣90−84∣=6 |
75 | ∣75−84∣=9 |
95 | ∣95−84∣=11 |
85 | ∣85−84∣=1 |
90 | ∣90−84∣=6 |
70 | ∣70−84∣=14 |
Start by finding the mean. Then, calculate the distances between the mean and each data value. Finally, find the mean of these distances.
Begin by recalling what is the mean absolute deviation.
Mean Absolute Deviation |
An average of how much data values differ from the mean. |
To find the mean absolute deviation, these steps can be followed.
Data Value | Absolute Value of Difference |
---|---|
9 | ∣9−13∣=4 |
9 | ∣9−13∣=4 |
12 | ∣12−13∣=1 |
15 | ∣15−13∣=2 |
14 | ∣14−13∣=1 |
11 | ∣11−13∣=2 |
14 | ∣14−13∣=1 |
15 | ∣15−13∣=2 |
15 | ∣15−13∣=2 |
10 | ∣10−13∣=3 |
16 | ∣16−13∣=3 |
16 | ∣16−13∣=3 |
Add terms
Calculate quotient
Round to 1 decimal place(s)
sigma— is commonly used to denote the standard deviation. In a given set of data, most of the values fall within one standard deviation of the mean.
Standard deviation shows the variation of data from the mean.
First find the mean of the given data set. Then, find the range of the values that are within one standard deviation from the mean.
The values that are less than 18.8 are 16 and 18, and the values that are greater than 27.2 are 28 and 30. That means the heights outside the range of one standard deviation from the mean are 16, 18, 28, and 30 inches.
In this lesson, the measures of center and measures of spread were discussed.
Measures of Center | Measures of Spread |
---|---|
Mean Mode Median |
Range Interquartile Range Mean Absolute Deviation Standard Deviation |
We are asked to determine how an outlier affects the range of a data set. First, let's recall that the range is the difference of the maximum and minimum values Range = Max Value-Min Value Now, the data value that is significantly different from other values is an outlier. Algebraically, the value is an outlier if it is less than the first quartile minus 1.5 times the interquartile range. Also, the value is an outlier if it is greater than the third quartile plus 1.5 times the interquartile range.
This means that outliers are either the least data values or the greatest data values. Therefore, they affect range. Let's take a look at the example data set and find its median and both quartiles.
The interquartile range (IQR) for this data set is 25- 15= 10. Let's find what numbers would be outliers for this data set using this information. Q_1-1.5(IQR) &= 15-1.5( 10)=0 Q_3+1.5(IQR) &= 25+1.5( 10) = 40 For our example data set, each data value that is less than 0 or greater than 40 would be considered an outlier. Now, let's compare the range of the original data set with the ranges of the data sets in which we change the least or the greatest values so that they are outliers.
Data Set | Outliers | Range |
---|---|---|
14,15,17,19,23,25, 26 | No Outliers | 26- 14=12 |
-1,15,17,19,23,25, 26 | -1 | 26-( -1)=27 |
14,15,17,19,23,25, 42 | 42 | 42- 14=28 |
-2,15,17,19,23,25, 41 | -2 and 41 | 41-( -2)=43 |
We can see that each time we have outliers in our data set, the range is greater than the range of the original data set. In general, when we have an outlier in a data set, it increases the range of this set. The answer is A.
Vincenzo is assigned to create a set of data with 7 values that has a mean of 40, a median of 35, a range of 50, and an interquartile range of 35. The steps Vincenzo followed when creating this data set are shown below.
We know that Vincenzo wants to create a data set with seven values that have defined measures.
Measure | Value |
---|---|
Mean | 40 |
Median | 35 |
Range | 50 |
Interquartile Range | 35 |
Vincenzo draws seven horizontal lines, each representing a data value. We can imagine that the lines represent the ordered data set.
We will examine each step one by one.
Let's remember what the median is.
Median |- For a set with an odd number of values, the median is the middle value. For a set with an even number of values, the median is the mean of the two middle values.
We have seven values, which is an odd number. This means that the median is the middle value. We also know that the median is equal to 35. Therefore, the middle value must be 35.
Vincenzo's first step is correct. Let's move on the next step.
We know that the range of the data set is 50. This means that the greatest value is 50 more than the least value. Range=Max Value−Min Value Vincenzo set the smallest value of the data to 20. Then, we greatest value must be 50 more, which is 70.
There is no error in the second step either.
Next, let's remember what the interquartile range is.
Interquartile Range |- The interquartile range, or the IQR is the difference of the third quartile and the first quartile.
This means that the difference between the third quartile, Q_3 and the first quartile, Q_1 is 35. Q_3 - Q_1 = 35 In this case, the second value is the first quartile and the sixth value is the third quartile.
Vincenzo choose a value for Q_1, which is 25. Notice that Q_1 is greater than the least value and less than the median. Since the interquartile range is 35, the third quartile must be 25+35 = 60.
There is no error in this step either.
The last condition is that the mean should be equal to 40. Let's remember what the mean is.
Mean |- The mean of the data set is the sum of the data divided by the number of data values.
Let's name the missing values x, and y.
Now, we can add all values and divide the sum by 7. The result should be equal to 40.
The sum of these values must be 70. However, Vincenzo chose these values 30 and 50, which add up to 80. Third Value && Sixth Value && Sum 30 & +& 50 &=& 80 Therefore, the last step is incorrect. The answer is IV. Vincenzo chose the third data value correctly because it is between 25 and 35. The value of y should be between 35 and 60 and x plus y must be 70.
We can see that for x = 30, the value of y is 40.
This data meets the specified conditions. It is important to note that there can be numerous different data sets that meet these conditions.