Glencoe Math: Course 3, Volume 2
GM
Glencoe Math: Course 3, Volume 2 View details
6. Analyze Data Distributions
Continue to next subchapter

Exercise 1 Page 720

Practice makes perfect

A line plot is a way of illustrating a data set in which each data point is represented with a mark above a number line. Marks representing the same elements are stacked above each other. We want to identify the shape of the distribution on the given line plot. Let's do it!

Notice that the left side of the distribution does not look like the right side. This means that the distribution is non-symmetric. Now we will look for any clusters, gaps, peaks, or outliers on the plot. Let's begin by recalling the definitions of these attributes.

Name Definition
Cluster Group of points that lie close together
Gap Area of a graph that does not contain any data values
Peak Most frequently occurring value or interval of values
Outlier Data point that is significantly different from the other values in the data set

With these definitions in mind, we can finally draw some conclusions about characteristics of our graph.

  • There is a cluster from 23 to 27 nacho bowls.
  • There is a gap from 20 to 23 nacho bowls and from 27 to 29 nacho bowls.
  • There is a peak at 27 nacho bowls.
  • There do not appear to be any outliers.
Note that clusters and outliers are found by observing a graph, so they are somewhat subjective. Our solution is just an example solution.

Extra

Extra

We can consider another approach for finding an outlier when given numerical data. In this case, a data value is an outlier if it is farther away from the closest quartile than 1.5 times the interquartile range (IQR). Let's find the outliers in three steps!

  1. Find the IQR of the data set.
  2. Calculate the lower and upper boundary for the value of the outliers.
  3. Determine whether the data set has outliers.
First, let's write all the data values as a set. 20, 20, 23, 23, 24, 24, 24, 24, 25, 25, 25, 26, 27, 27, 27, 27, 27, 29, 30, 30, 31 Now, recall that the interquartile range is the difference between Q_3 and Q_1, the upper and lower quartiles. We can use the animation to find the IQR of the set!
The interquartile range is equal to 3. We know that the outliers satisfy one of the following conditions. Less thanQ_1-1.5* IQR Greater thanQ_3+1.5* IQR Let's calculate these values knowing that Q_1= 24 and Q_3= 27!
Q_1-1.5* IQR Q_3+1.5* IQR
24-1.5* 3= 24-4.5 27+1.5* 3= 27+4.5
19.5 31.5

All values of the data set are greater than 19.5 and less than 31.5. This means that our data set has no outliers.

We want to describe the center and spread of the distribution. Let's start by recalling some facts about measures of center and spread.

Best Describes the ...
Center of a Distribution Spread of a Distribution
Symmetric Mean Mean absolute deviation
Non-symmetric Median Interquartile range
We know from Part A that the distribution is non-symmetric. This means that we should use the median to describe the center and the interquartile range to describe the spread of the distribution. We will find these measures one at a time.

Center of Distribution

When the data points are arranged in numerical order, the median is the middle value — or the mean of the two middle values — in a set of data. To find the median, we will start by looking at the given line plot.

There are 21 data values in the set, so the middle value is the 11th value. Let's write all the data values as a set and mark the median. 20, 20, 23, 23, 24, 24, 24, 24, 25, 25, 25, 26, 27, 27, 27, 27, 27, 29, 30, 30, 31 The data values are centered around the median 25.

Spread of Distribution

The interquartile range is a measure of spread that calculates the difference between Q_3 and Q_1, the upper and lower quartiles. We can use the animation to find the IQR of the set!
The interquartile range is equal to 3. This means that the spread of the data around the center is 3.