Sign In
A line plot is a way of illustrating a data set in which each data point is represented with a mark above a number line. Marks representing the same elements are stacked above each other. We want to identify the shape of the distribution of the given line plot. Let's do it!
Notice that the left side of the distribution is taller than the right side. This means that the distribution is non-symmetric. Now we will look for any clusters, gaps, peaks, or outliers on the plot. Let's begin by recalling the definitions of these attributes.
Name | Definition |
---|---|
Cluster | Group of points that lie close together |
Gap | Area of a graph that does not contain any data values |
Peak | Most frequently occurring value or interval of values |
Outlier | Data point that is significantly different from the other values in the data set |
With these definitions in mind, we can draw some conclusions about the characteristics of our graph.
We want to confirm whether 3.6 is an outlier of our data set. Let's do using another approach for finding an outlier when given numerical data. In this case, a data value is an outlier if it is farther away from the closest quartile than 1.5 times the interquartile range (IQR). Let's find the outliers in three steps!
Q1−1.5⋅IQR | Q3+1.5⋅IQR |
---|---|
1.5−1.5⋅0.9=1.5−1.35 | 2.4+1.5⋅0.9=2.4+1.35 |
0.15 | 3.75 |
All values of the data set are greater than 0.15 and less than 3.75. This means that 3.6 is not actually an outlier — in fact, our data set has no outliers!
We want to describe the center and spread of the distribution. Let's start by recalling some facts about measures of center and spread.
Best Describes the ... | ||
---|---|---|
Center of a Distribution | Spread of a Distribution | |
Symmetric | Mean | Mean absolute deviation |
Non-symmetric | Median | Interquartile range |
We know from Part A that the distribution is non-symmetric. This means that we should use the median to describe the center and interquartile range to describe the spread of the distribution. Let's find these measures one at a time.
When the data are arranged in numerical order, the median is the middle value — or the mean of the two middle values — in a set of data. To find the median, we will start by looking at the given line plot.