body{margin:0;} noscript{z-index: 10000; position: fixed; display: block; height: 100%; width: 100%; background: #fff;} noscript .fa{font-size: 40px; display:block; color: #daa520;} noscript div{margin-top: 6%; padding-left: 20px; padding-right: 20px; text-align: center;} ! You must have JavaScript enabled to use this site.

Dashboard

Pre-Algebra View details

6. Statistical Displays

Continue to next lesson

Lesson

Exercises

Tests

eCourses /

Pre-Algebra /

Chapter 12

Statistical Displays

Statistical displays are essential tools for visually representing data and interpreting its distribution, trends, and patterns. These visual methods help simplify complex datasets, making it easier to analyze and communicate findings. Dot plots, histograms, and box plots each serve unique purposes in summarizing data, illustrating its central tendencies, spread, and overall shape. Understanding these displays allows for more effective comparisons and conclusions in various fields, such as education, research, and business. By mastering these tools, interpreting data and making informed decisions becomes more efficient and reliable.

Lesson Settings & Tools

	19 Theory slides
	10 Exercises - Grade E - A
	Each lesson is meant to take 1-2 classroom sessions

Image Credits

Slide of 19

Data sets appear in various scenarios in everyday life. Statistical displays are used for their analysis and making conclusions about these data sets. This lesson will introduce different types of statistical displays and explain what they are best used for.

Catch-Up and Review

Here are a few recommended readings before getting started with this lesson.

Numerical Data
Categorical Data
Number Line
Discrete Data
Statistical Measures

Challenge

Which Display to Choose?

Consider four different situations that contain a data set.

Situation $1$	A company wants to compare the monthly sales performance of its Product A over the course of a year.
Situation $2$	A survey is conducted to gather data on the distribution of ages among participants in a community event.
Situation $3$	An analysis is conducted to compare the distribution of exam scores in a class.
Situation $4$	A study tracks the temperature variations over the course of a week in a particular city.

There exist a few statistical displays, which can be used to show data from these and similar situations. Which of the following four displays fit best for each given situation? Click on the displays' names to see how they look.

Discussion

Dot Plot

A dot plot, also known as line plot, is a way to represent numerical data in which each data point is represented with a dot above a horizontal number line. The dots representing the same measurements are stacked above each other. Consider the following data set.

{4, 4, 3, 1, 4, 4, 1, 4}

There are two $1$ s in this data set, so on the corresponding dot plot two dots are stacked above the number $1$ of the number line.
There is one $3$ in this data set, so on the corresponding dot plot a single dot is drawn above the number $3$ of the number line.
There are five $4$ s in this data set, so on the corresponding dot plot five dots are stacked above the number $4$ of the number line.

A dot plot illustrating the data set given in the text.

For data sets containing more than

20

data points, dot plots are often inconvenient and other representations are preferred.

Example

Math Competition Results

Tadeo bought a new scientific magazine Imagineer, which includes a lot of different graphs and math-related discussions on real-life topics. The first article discussed the results of a recent math competition among high-school students.

The article illustrated the grades of the participants obtained on the scale from $0$ to $10 .$ Interpret the dot plot.

For comparison, the article also listed the grades of the last year's participants. Construct a dot plot from the given data set.

Last Year ’ s Results 1, 3, 4, 4, 6, 6, 6, 7, 7, 7, 8, 8, 9

Answer

a Median:

6

Range: $8$

Shape: Symmetric

b Dot Plot

Hint

a Determine the median of the data set, find the maximum and minimum values and the range of the data. Analyze the overall shape of the graph and any distinctive features.

b Draw a horizontal line, title the dot plot and choose the labels for the line, either categories or sequential consecutive numbers. Find the frequency of each value in the data set and place the corresponding number of dots over it on the line.

Solution

a Interpreting a dot plot consists of three steps.

Step $1$	Determine the center (median) by finding the middle data point.
Step $2$	Find the maximum and minimum values on the graph. Use these values to calculate the spread (range) of the data.
Step $3$	Analyze the overall shape of the graph. Note any other features of interest on the graph.

Complete each step one at a time.

Step $1$

First, the total number of data points should be found. This is why begin by counting all the dots in the given dot plot.

There are $15$ data points. This means that there were $15$ participants in the math competition that obtained a grade. The median of the data set is, therefore, the $8 th$ data point, as it divides the set into halves. Find its value on the dot plot.

The $8 th$ data point is $6,$ which means that the median or the center of the data set is $6 .$ This measure indicates the middle value of all the grades obtained by the participants.

Step $2$

Now, find the maximum and minimum values on the dot plot.

The minimum grade obtained by a participant is

2

and the maximum grade is

10 .

To find the range, calculate the difference between these values.

Range = 10 - 2 = 8

This means that the range of grades obtained by the participants of the math competition is

8 .

Step $3$

The next step is to analyze the overall shape of the graph.

The overall shape of this graph appears to be the bell shape of a normal distribution, meaning the grades are overall normally distributed, and the plot is symmetric in shape.

b A dot plot can be constructed by following these three steps.

Step $1$	Draw a horizontal line to begin the dot plot. Title the dot plot based on the problem, and label the plot with the categories/numbers. When labeling the line with numbers, the numbers must be sequential and in a consecutive order.
Step $2$	Determine the frequency for each piece of data provided in the problem.
Step $3$	Place dots over each category or number on the horizontal line that corresponds to the frequency for each piece of data as depicted in the table.

Complete each step one at a time.

Step $1$

First, a horizontal line should be drawn. The given data set includes the grades of the participants from last year, so the title of the dot plot can be Grades From Last Year.

1, 3, 4, 4, 6, 6, 6, 7, 7, 7, 8, 8, 9

Also, the grades vary from

1

9,

so the dot plot can be labeled with the consecutive integers from

1

9 .

Step $2$

The next step is to determine the frequency of each grade obtained by the participants from last year's math competition. To do so, count how many times each grade from $1$ to $9$ appears and write that number in a table next to the grade.

Grade	Frequency
$1$	$1$
$2$	$0$
$3$	$1$
$4$	$2$
$5$	$0$
$6$	$3$
$7$	$3$
$8$	$2$
$9$	$1$

Step $3$

Lastly, place dots over each grade from $1$ to $9$ the number of times it appears in the data set. Use the values from the frequency table.

This way the dot plot for the given data set of values was formed.

Discussion

Frequency Distribution

A frequency distribution, sometimes called a histogram distribution, is a representation that displays the number of observations within a given interval. It is used to show the empirical or theoretical frequency of occurrence of each possible value in a data set, often recorded in a frequency table. Frequency distributions of categorical data are typically presented using a bar graph.

The most common types of distributions are symmetric frequency distribution and skewed frequency distribution.

Discussion

Histogram

A histogram is a graphical illustration of a frequency distribution of a data set that contains numerical data. Histograms have several defining characteristics.

The data is grouped into specific ranges of values known as intervals.
All intervals in a histogram must be the same size.
Interval data is marked in groups along the horizontal axis.
A histogram is a collection of rectangles drawn above intervals.
The height of each rectangle is proportional to the frequency of the data in the corresponding interval.

Consider an example situation. A fruit store wants to examine the weights of the apples they sell. To see the distribution, it is not necessary to show each apple's weight individually. Instead, the apples can be grouped by their weights in intervals of

10 :

70 - 79 g,

80 - 89 g,

and so on.

A histogram showing the distribution of the weight of apples

A histogram looks similar to a bar graph. The difference is that a histogram has numbers on the horizontal axis and the bars cannot have a space between each other because the data is continuous.

Discussion

Drawing a Histogram

A data set can be illustrated with a histogram. Consider the following data set.

13, 11, 4, 11, 21, 25, 37, 17, 8, 19, 26, 15

To draw a histogram for this data set, there are four steps to follow.

Choose the Number of Intervals

The first step to drawing a histogram is deciding what intervals of numbers it will have. Remember that each interval must have the same length and all data points must lie in an interval. First, count the numbers in the data set.

13, 1 37, 7 11, 2 17, 8 4, 3 8, 9 11, 4 19, 10 21, 5 26, 11 25, 61512

One method to find a suitable number of intervals is to take the square root of the number of data points. Since there are

12

numbers in the data set, calculate the square root of

12 .

12 = 3.464101 \dots

This means that the histogram can have either three or four intervals. In this case, it will have four intervals.

Determine the Size of the Intervals

Next, it is necessary to determine the size of the intervals. This can be done by identifying the lowest and highest data value in the set.

Lowest : Highest : 437

Since the lowest data value in the set is

4

and the highest is

37,

using four intervals with a range of

10

will cover numbers from

1

40

and, therefore, will encompass all data points. The intervals of the histogram will be the following.

1 - 10, 11 - 20, 21 - 30, 31 - 40

Make a Frequency Table

The next step is to make a frequency table showing how many data points lie in each interval.

Interval	Data Points	Frequency
$1 - 10$	$4,$ $8$	$2$
$11 - 20$	$11,$ $11,$ $13,$ $15,$ $17,$ $19$	$6$
$21 - 30$	$21,$ $25,$ $26$	$3$
$31 - 40$	$37$	$1$

Draw a Histogram

From the frequency table, the histogram can be constructed by drawing a bar over each interval with a height corresponding to the found frequency.

Example

Analyzing Ticket Sales

Another article in Imagineer focuses on the upcoming opening of a new screen room in the local cinema Movieton. The theater had $2$ screens for the last $10$ years. The histogram in the article illustrates the distribution of ticket sales for a fiscal week in the year $2023 .$

Histogram showing the ticket sales from Monday to Sunday

Interpret the bar graph by describing its shape, center, and any extreme values if they exist. Use the bar graph to determine what day tends to have the most ticket sales, and what the average amount of ticket sales is on that day.

The article also examined the distribution of ages of people attending the cinema theater. As a reference, it listed the ages of cinema visitors on a random evening. Draw a histogram for this data set.

24, 20, 23, 24, 9, 11, 42, 15, 17, 60, 18, 25, 26, 7, 28, 30, 32, 70, 45, 35, 37, 19, 13, 22, 49, 27, 21, 55, 39, 24

Answer

a Independent Variable: The days of the week

Dependent Variable: The number of tickets sold
Distribution: Left-Skewed

Most Ticket Sales: Saturday

b Histogram:

Hint

a Identify the independent and dependent variables. List the frequencies of every bin and use the values to interpret the data. Then describe the bar graph's shape.

b Choose the number of intervals and determine their sizes. Make a frequency table and use the values to draw a histogram.

Solution

a Interpreting a bar graph involves three steps.

Step $1$	Identify the independent and dependent variable.
Step $2$	List the frequency in each bin.
Step $3$	Interpret the data and describe the bar graph's shape. Use the interpretation to answer any questions about the data.

Complete each step one at a time.

Step $1$

First, the independent and dependent variables need to be identified. The horizontal line lists days of a week while the vertical line represents the number of tickets sold.

The article analyzes how many tickets are sold on different days of a week and which day has the most sales. This means that the independent variable is the days of the week and the dependent variable is the number of tickets sold on each day.

Step $2$

Next, the frequency in each bin should be listed and interpreted. Use the values given in the bar graph to indicate the height of each bin. Remember that the vertical line represents the number of tickets sold on each day.

Day	Frequency
Monday	$64$ tickets were sold.
Tuesday	$70$ tickets were sold.
Wednesday	$62$ tickets were sold.
Thursday	$137$ tickets were sold.
Friday	$295$ tickets were sold.
Saturday	$342$ tickets were sold.
Sunday	$260$ tickets were sold.

Step $3$

Lastly, interpret the data and describe the bar graph's shape. The bar graph shows that the distribution of ticket sales is left-skewed.

Friday and Saturday are the days with the most number of tickets sold, $295$ and $342$ respectively. Also, the largest number of tickets tend to be sold on Saturday, and that number of tickets is $342 .$

b To draw a histogram, follow these four steps.

Step $1$	Choose the number of intervals
Step $2$	Determine the size of the intervals
Step $3$	Make a frequency table
Step $4$	Draw a histogram

Complete each step one at a time.

Step $1$

The first step is determining what intervals of numbers it will have. Remember that each interval must have the same length and all data points must lie in an interval. First, count the numbers in the data set.

241, 202, 233, 244, 95, 116, 427, 158, 179, 6010, 1811, 2512, 2613, 714, 2815, 3016, 3217, 7018, 4519, 3520, 3721, 1922, 1323, 2224, 4925, 2726, 2127, 5528, 3929, 2430

To find a suitable number of intervals, take the square root of the number of data points. In this case, there are

30

data points.

30 = 5.477225 \dots

This means that the histogram can have either four or five intervals. In this case, it will have four intervals.

Step $2$

Next, the size of the intervals needs to be determined. This can be done by identifying the lowest and highest data value in the set.

Lowest : Highest : 770

Since the lowest data value in the set is

7

and the highest is

70,

using four intervals with a range of

20

will cover numbers from

1

80

and, therefore, will encompass all data points. The intervals of the histogram will be the following.

1 - 20, 21 - 40, 41 - 60, 61 - 80

Step $3$

The next step is to make a frequency table showing how many data points lie in each interval.

Interval	Data Points	Frequency
$1 - 20$	$7,$ $9,$ $11,$ $13,$ $15,$ $17,$ $18,$ $19,$ $20$	$9$
$21 - 40$	$21,$ $22,$ $23,$ $24,$ $24,$ $24,$ $25,$ $26,$ $27,$ $28,$ $30,$ $32,$ $35,$ $37,$ $39$	$15$
$41 - 60$	$42,$ $45,$ $49,$ $55,$ $60$	$5$
$61 - 80$	$70$	$1$

Step $4$

From the frequency table, the histogram can be constructed by drawing a bar over each interval with a height corresponding to the found frequency.

Discussion

Box Plot

A box plot or box and whisker plot can be used to illustrate the distribution of a data set. A box plot has three parts.

A rectangular box that extends from the first to the third quartiles $(Q_{1}$ and $Q_{3})$ with a line between $Q_{1}$ and $Q_{3}$ indicating the position of the median.
A segment attached to the left of the box that extends from the first quartile to the minimum of the data set.
A segment attached to the right of the box that extends from the third quartile to the maximum of the data set.

A box plot is a scaled figure, usually presented above a number line. The set of numbers used to draw the box plot is called the five-number summary of the data set. Each of the five numbers is labeled accordingly.

Boxplot shown above a number line with a five-number summary from left to right as 1, 3, 5, 8, 10.

A box plot provides a visual illustration of the distribution of a data set. Each segment of the chart contains one quarter, or

25 %

of the data, and the center

50 %

of the data lies inside the box. The further apart the segments are, the greater the spread is for that quarter of the data.

Discussion

Drawing a Box Plot

A box plot can be used to display any data set of numbers. To draw it, the minimum, maximum, median, and first and third quartiles of the data set need to be identified. The following data set gives the test scores for a grade.

8.5, 11, 16, 12.5, 11, 15.5, 12, 7, 13, 10.5, 5, 15, 8, 9, 8, 8.5, 6, 12, 15, 15.5, 13.5, 7.5, 13, 10.5, 11.5, 13.5

There are four steps to follow to draw a box plot for the data set.

Find the Minimum and Maximum

Sometimes, the data is given in ascending order. When it is not, it is necessary to begin by ordering the data points from least to greatest.

5, 6, 7, 7.5, 8, 8, 8.5, 8.5, 9, 10.5, 10.5, 11, 11, 11.5, 12, 12, 12.5, 13, 13, 13.5, 13.5, 15, 15, 15.5, 15.5, 16

Now the minimum and maximum are easily identifiable in this ordered data set. Here, the minimum is

5

and the maximum is

16 .

These values are marked above a number line with a line segment, indicating the range of the box plot.

The number line and the beginning of the box plot showing maximum at 5 and minimum value at 16

Determine the Median

Counting all the data points gives the conclusion that there are

26

values in the set.

51, 62, 73, 7.5 4, 85, 86, 8.5 7, 8.5 8, 99, 10.5 10, 10.5 11, 1112, 1113, 11.5 14, 1215, 1216, 12.5 17, 1318, 1319, 13.5 20, 13.5 21, 1522, 1523, 15.5 24, 15.5 25, 1626

To find the median, recall that it is the value that lies in the middle of a data set. Since there are

26

values, the median is the mean of the numbers at the

13 th

and

14 th

position.

51, 62, 73, 7.5 4, 85, 86, 8.5 7, 8.5 8, 99, 10.5 10, 10.5 11, 1112, 1113, 11.5 14, 1215, 1216, 12.5 17, 1318, 1319, 13.5 20, 13.5 21, 1522, 1523, 15.5 24, 15.5 25, 1626

Now, determine the median by calculating the mean of

11

and

11.5 .

Median = \frac{1 1 + 1 1 . 5}{2} = 11.25

The median is

11.25 .

Mark this value as a vertical line segment in the range above the number line. Remember that the line for the median falls inside the box.

The unfinished box plot with the minimum value at 5, maximum value at 16, and median at 11.25

Determine the Quartiles

The next step is to find the first and third quartiles of the data set. The median divides the set into two smaller sets, each with

13

values.

Set 1 51, 62, 73, 7.5 4, 85, 86, 8.5 7, 8.5 8, 99, 10.5 10, 10.5 11, 1112, 1113 Median : 11.25 11.5 1, 122, 123, 12.5 4, 135, 136, 13.5 7, 13.5 8, 159, 1510, 15.5 11, 15.5 12, 1613 Set 2

The first quartile is the middle,

7 th

value in the first set with smaller values. It equals

8.5 .

Set 1 51, 62, 73, 7.5 4, 85, 86, 8.5 7, 8.5 8, 99, 10.5 10, 10.5 11, 1112, 1113

The third quartile is the median of the second set with greater values. It equals the

7 th

value of

13.5 .

11.5 1, 122, 123, 12.5 4, 135, 136, 13.5 7, 13.5 8, 159, 1510, 15.5 11, 15.5 12, 1613 Set 2

Draw the Box Plot

The first and third quartiles are marked as the left and right sides of the box plot. The box plot can be completed by drawing a box between the quartiles.

Boxplot shown above a number line with a five-number summary from left to right as 5, 8.5, 11.25, 13.5, 16.

Example

Air Pollution Levels

The next article in Imagineer focused on an analysis of air quality data collected from different urban and rural areas around the world. It included a box plot to visualize variations in pollution levels.

The box plot with the numbers 24, 39, 63.5, 83, 98

The box plot presented in the article is based on the scores from $0$ to $100,$ where $100$ is the greatest level of pollution, given to different areas by experts according to various factors of air pollution. Interpret the box plot.

For comparison, the article also listed the air pollution levels in each of the considered areas

10

years ago. Construct a box plot from the given data set.

14, 26, 63, 42, 36, 22, 79, 18, 45, 55, 32, 9, 60, 48, 12, 69, 52, 25, 38, 7

Answer

a Minimum:

24

Maximum: $98$
Median: $63.5$
First Quartile: $39$

Third Quartile:

83

b Box Plot

Hint

a Determine the minimum, maximum, median, first quartile and third quartile of the data set. Use the definitions of these concepts to find the meanings of the values.

b Draw a line segment between the minimum and maximum above a number line. Mark the median as a vertical line segment inside the box. Use the first and third quartiles to find the left and right borders of the box.

Solution

a Interpreting a box plot means identifying its minimum, maximum, median, first and third quartiles.

Minimum = ? Maximum = ? Median = ? First Quartile = ? Third Quartile = ?

Recall what each part of a box plot represents.

Now, consider the given box plot.

By comparing the general box plot with this one, the minimum, maximum, median, first and third quartiles can be determined.

Minimum Maximum Median First Quartile Third Quartile = 24 = 98 = 63.5 = 39 = 83

Next, use the definitions of each concept to see what each of these values mean.

Concept	Value	Meaning
Minimum	$24$	The least level of pollution in the analyzed areas is $24$ out of $100 .$
Maximum	$98$	The greatest level of pollution in the analyzed areas is $98$ out of $100 .$
Median	$63.5$	The average level of pollution in the analyzed areas is $63.5$ out of $100 .$
First Quartile	$39$	$25 %$ of the analyzed areas have the level of pollution at $39$ or less.
Third Quartile	$83$	$25 %$ of the analyzed areas have the level of pollution at $83$ or more.

b A box plot can be constructed by following these four steps.

Step $1$	Order the data set from least to greatest value. Find the minimum and maximum.
Step $2$	Determine the median.
Step $3$	Determine the first and third quartiles.
Step $4$	Draw a box plot.

Complete each step one at a time.

Step $1$

Begin by ordering the given data set from least to greatest value.

7, 9, 12, 14, 18, 22, 25, 26, 32, 36, 38, 42, 45, 48, 52, 55, 60, 63, 69, 71

Now the minimum and maximum are easily identifiable in this ordered data set. Here, the minimum is

7

and the maximum is

71 .

Step $2$

To find the median of the data set, the number of values in the set should first be determined. Begin by counting all the data points.

71, 92, 123, 144, 185, 226, 257, 268, 329, 3610, 3811, 4212, 4513, 4814, 5215, 5516, 6017, 6318, 6919, 7120

To find the median, look for the value that lies in the middle of a data set. Since there are

20

values, the median is the mean of the numbers at the

10 th

and

11 th

position.

71, 92, 123, 144, 185, 226, 257, 268, 329, 3610, 3811, 4212, 4513, 4814, 5215, 5516, 6017, 6318, 6919, 7120

Now, determine the median by calculating the mean of

36

and

38 .

Median = \frac{3 6 + 3 8}{2} = 37

The median is

37 .

Step $3$

The next step is to find the first and third quartiles of the data set. The median divides the set into two smaller sets, each with

10

values.

Set 1 71, 92, 123, 144, 185, 226, 257, 268, 329, 3610 Median : 37 3811, 4212, 4513, 4814, 5215, 5516, 6017, 6318, 6919, 7120 Set 2

The first quartile is the mean of the

5 th

and

6 th

value in the first set with smaller values.

Set 1 71, 92, 123, 144, 185, 226, 257, 268, 329, 3610

It equals

20 .

Q_{1} = \frac{1 8 + 2 2}{2} = 20

The third quartile is the median of the second set with greater values. It can be calculated as a mean of the

15 th

and

16 th

value.

3811, 4212, 4513, 4814, 5215, 5516, 6017, 6318, 6919, 7120 Set 2

Its value is

53.5 .

Q_{3} = \frac{5 2 + 5 5}{2} = 53.5

Step $4$

To draw a box plot, recall the found values of the minimum, maximum, median, first and third quartiles of the data set.

Minimum Maximum Median Q_{1} Q_{3} = 7 = 71 = 37 = 20 = 53.5

First, mark the values of minimum and maximum above a number line with a line segment, indicating the range of the box plot.

Next, mark the median as a vertical line segment in the range above the number line. Remember that the line for the median falls inside the box.

The first and third quartiles are marked as the left and right sides of the box plot. The box plot can be completed by drawing a box between the quartiles.

Discussion

Describing the Shape of a Distribution

The distribution of a data set shows the arrangement of data values. Here are a few concepts that can be used to describe a distribution.

Concept	Definition
Cluster	Data that is grouped closely together.
Gap	The numbers that have no data value.
Peak	The most frequently occurring values, or mode.
Symmetry	The left side of the distribution looks like the right side.

Consider a distribution displayed with the following dot plot.

Since the data is evenly distributed between the left and right side, it is a symmetric distribution. It has a cluster of several data values within the interval $5 - 9 .$ There are gaps at $4$ and $10$ because there are no data values. The value $7$ is a peak because it is the most frequently occurring value.

A dot plot illustrating a data set with the cluster, gap, peak marked

Discussion

Measures of of Center and Spread

There are different measures of center and spread available for describing a data distribution. For example, measures of center are mean and median, while measures of spread are interquartile range and mean absolute deviation. To determine which measures to use, consider the following diagram.

Flow chart that says: Is the data distribution symmetric? If yes, then use the mean to describe the center. Use the mean absolute deviation to describe the spread. If no, then use the median to describe the center. Use the interquartile range to describe the spread.

Note that if there is an outlier in the data distribution, the distribution is usually not symmetric.

Example

Internet Usage Levels

Tadeo got especially interested in the article about the Internet usage among teenagers. Curious to learn more, he decided to check out the website referenced in the article.

The dot plot for the numbers of hours spent on the internet

Describe the shape of a distribution. Choose the appropriate measures to describe the center and spread of the distribution.

The website also had a dot plot for the number of text messages sent by different teenagers in one day.

A dot plot illustrating the number of text messages sent on one day

Choose the appropriate measures to describe the center and spread of the distribution. Describe the shape of the distribution.

Answer

a Distribution: Symmetric

Measure of Center: Mean

Measure of Spread: Mean absolute deviation

b Distribution: Not symmetric

Measure of Center: Median

Measure of Spread: Interquartile range

Hint

a The distribution is symmetric when the left side of the data is similar to the right side. Decide which measures of center and spread are the most appropriate based on the symmetry of the distribution.

b Determine if the distribution symmetric or not by analyzing the shape of the dot plot. Find the most appropriate measures of center and spread based on the symmetry of the distribution.

Solution

a Begin by examining closely the dot plot showing the number of hours spent on the Internet by teenagers.

The data is evenly distributed between the left and right side, so it is a symmetric distribution. It only has one cluster of data values within the interval $1 - 8$ and there are no gaps. The value $5$ is a peak because it is the most frequently occurring value.

Next, decide which measures of center and spread are the most appropriate based on the symmetry of the distribution. Remember what measures can be used in this situation.

Symmetric Distribution?	Measure of Center	Measure of Spread
Yes	Mean	Mean absolute deviation
No	Median	Interquartile range

Since this distribution is symmetric, it is best to use the mean as a measure of center and the mean absolute deviation as the measure of spread.

b Consider the dot plot that shows the number of text messages sent by different teenagers in one day.

Here, the left side of the data is different than the right side, so the distribution is not symmetric. Also, there are two clusters of data values within the intervals $17 - 20$ and $22 - 24$ separated by a gap at $21 .$ The peak of the data set is at $23 .$

The most appropriate measures of center and spread can be determined by looking at the symmetry of the distribution. When the distribution is not symmetric, as in this case, it is best to use the median as a measure of center and the interquartile range as the measure of spread.

Discussion

Line Graph

A line graph is used to show how a set of data changes over a period of time. To make a line graph, a scale and interval should be chosen. Then the pairs of data should be graphed and a line connecting the points should be drawn. Consider a table of values that represents the growth of a plant over several weeks.

	Plant Growth
Week	$1$	$2$	$3$	$4$	$5$
Height, $(in)$	$1.5$	$2.3$	$4$	$6.2$	$8$

The height data includes values from $1.5$ to $8,$ so a scale from $0$ to $10$ inches with an interval of $1$ inch are reasonable. The horizontal axis can represent time in weeks and the vertical axis can represent the plant height in inches. Now, the points can be plotted on a coordinate plane and connected.

By observing the upward and downward slant of lines connecting the points, the trends in the data can be described and future events can be predicted.

Example

A Trip to Grandparents

Tadeo got so inspired by analyzing all the graphs in the magazine that he decided to make his own diagram. Luckily, the next day he and his parents planned to drive to Tadeo's grandparents, who live

420

miles from them. On their way, Tadeo recorded how far they traveled after each hour of driving.

A car traveling and a table of values gets filled with times and distances

Make a line graph using the table of values.

Interpret the line graph. In about how many hours will Tadeo and his parents reach the grandparents?

Answer

a Line Graph:

6

hours

Hint

a Choose a scale and intervals for the axes and define what they will represent. Plot the points from the table and connect them with segments.

b Examine whether the graph shows an upward or downward trend. Extend the graph to predict when Tadeo's family will travel the distance of

420

miles.

Solution

a The line graph representing the given situation should be graphed. First, consider the table of values made by Tadeo.

Hour	Distance Traveled, (mi)
$1$	$70$
$2$	$135$
$3$	$203$
$4$	$278$
$5$	$348$

The distance data includes values from $70$ to $348,$ so a scale from $0$ to $420$ miles with an interval of $70$ miles are reasonable. The horizontal axis can represent time in hours and the vertical axis can represent the distance traveled in miles. Now, the points can be plotted on a coordinate plane and connected.

This way the line graph representing the distance traveled by Tadeo's family to his grandparents was made.

b It is time to analyze the graph from Part A to interpret the line graph.

Notice that the graph shows an upward slant of the line with a steady increase from hours $1$ to $5 .$ To predict in how many hours Tadeo will reach his grandparents, extend the graph to the point where the distance is $420$ miles following the trend from the graph.

It can be predicted that Tadeo's family will reach their destination after about $6$ hours of traveling.

Discussion

Different Types of Displays

There are several different statistical displays that can be used to represent a data set. To determine which display to choose, the following facts can be considered.

Type of Display	Best Used to...
Bar Graph	...show the number of items in specific categories.
Box Plot	...show measures of variation for a data set.
Histogram	...show frequency of data divided into equal intervals.
Line Graph	...show change over a period of time.
Line Plot	...show how many times each number occurs.

Example

Life Expectancy in Different Counties

Tadeo's grandparents are $83$ and $85$ years old. They told Tadeo numerous fascinating stories about their life. He was wondering that his grandparents got to love such long and wonderful lives. Later he started wondering about life expectancy in different countries, so he did a little investigation.

Country	Life Expectancy
United States	$76.3$
Japan	$84.5$
Germany	$80.9$
Brazil	$77.3$
China	$78.2$
India	$68.3$
Australia	$83.3$
South Africa	$62.4$

Select an appropriate display for the given data set. Justify the answer.

Construct the chosen display in Part A.

Answer

a Bar graph

Hint

a Recall the available types of displays and what they are best used for.

b Let the horizontal axis represent countries and the vertical axis represent life expectancy. Draw bars for each country as high as its life expectancy is.

Solution

a The most appropriate display needs to be selected for the given data set. Begin by recalling the types of display and when to use them.

Type of Display	Best Used to
Bar Graph	It shows the number of items in specific categories.
Box Plot	It shows measures of variation for a set of data. It is also useful for very large data sets.
Histogram	It shows the frequency of data divided into equal intervals.
Line Graph	It shows the change over a period of time.
Line Plot	It shows how many times each number occurs.

The given data set lists the countries and the corresponding life expectancy. After analyzing the available types of displays, a bar graph looks like the best choice as it can show the data in two categories: country and life expectancy.

b A bar graph for the given data set should be constructed.

Country	Life Expectancy
United States	$76.3$
Japan	$84.5$
Germany	$80.9$
Brazil	$77.3$
China	$78.2$
India	$68.3$
Australia	$83.3$
South Africa	$62.4$

The countries can be marked on the horizontal axis and the life expectancy can be marked on the vertical axis. Next, draw bars with the height equal to the life expectancy.

Closure

Choosing the Best Fitting Displays

Recall the four situations mentioned earlier.

Situation $1$	A company wants to compare the monthly sales performance of its Product A over the course of a year.
Situation $2$	A survey is conducted to gather data on the distribution of ages among participants in a community event.
Situation $3$	An analysis is conducted for the distribution of exam scores in a class.
Situation $4$	A study tracks the temperature variations over the course of a week in a particular city.

Now, consider the available statistical displays and click on their names to see their graphs.

To match each situation to a graph, analyze each situation one at a time.

Situation $1$

In the first situation, the monthly sales performance of the Product A are compared. This means that the horizontal axis should show twelve months and the vertical axis should show the sales numbers for each month. This description fits the dot plot.

A dot plot illustrating the monthly sales of the Product A

Situation $2$

In the second situation, the data on the distribution of ages among participants in a community event is collected. This data can be illustrated by showing the ages in different intervals on the horizontal axis and the number of people in the corresponding age interval on the vertical axis. This situation can be presented by a histogram.

Histogram showing the ages of participant of an event

Situation $3$

The third situation involves the analysis of exam scores distribution in a class. The two remaining statistical displays are a box plot and a line graph. Recall what they are best used for.

Type of Display	Best used to...
Box Plot	...show measures of variation for a data set.
Line Graph	...show change over a period of time.

Since the distribution of the exam scores should be illustrated and no period of time is involved, a box plot seems like the best fitting display for the situation.

Situation $4$

The last situation includes tracking temperature variations over the course of a week. This data can be demonstrated with a line graph where the horizontal axis represents the days of the week and the vertical axis represents the temperature.

Level 1

Level 2

Level 3

Statistical Displays

Exercises

Consider a dot plot showing the average number of hours a group of surveyed people slept.

Find the median and mode.

Find the range and outliers. If there are no outliers, leave the space empty.

Let's begin by recalling the definition of the median.

Median |- The median of a list of values is the value appearing at the center of a sorted version of the list, or the mean of the two central values, if the list contains an even number of values.

First, use the given dot plot to write the ordered data set. 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9 There are 15 values in the data set. This means that the median is the middle 8th data point. 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9 The median is 6. Now, we will find the mode of the data set.

Mode |- The mode of a data set is the number or numbers that occur most often.

We can use the line plot to identify such values in our data set. We simply need to check which value has the most dots marked above it.

The value occurring the most often in our data set is 6. Therefore, the mode is also 6.

Let's find the range of the given data values.

Range |- The range of the data set is the difference between the greatest and least data values.

To find the range, we need to find the difference between the largest and the smallest value.

Ordered Data Set	5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9
Range	9-5 = 4

Our data values are between 5 and 9. Their range is 4. Now, remember the definition of outliers.

Outlier |- An outlier is a data point that is significantly different from the other values in the data set. It can be significantly larger or significantly smaller than the others.

From the dot plot, we can see that no single value lies significantly farther from the rest of the data set. This means that there are no outliers in the data set.

Consider a histogram illustrating Olympic woman's cycling time trials.

Describe the histogram and identify the interval that has

5

cyclists.

How many cyclists had a time less than

75

minutes?

Start by describing the given histogram.

Each height of the bar represents the number of contestants that finished the race in the corresponding time. We can find the total number of contestants by adding the heights of the bars. 9 + 7 + 5 + 3 = 24 There are 24 contestants. We can see that there is no bar for the time between 55 and 59 minutes. This means that no women finished the race in a time between 55 and 59 minutes. Most of the participants finished the race in more than 60 minutes but less than 64 minutes. To find the interval that has 5 cyclists, we look for the bar that has a height of 5.

We can see that the third bar has a height of 5. The corresponding interval is 70 - 74.

We want to use the given histogram to find how many cyclists had a time less than 75 minutes. Let's look at the intervals on the histogram. We will consider the bars that have times less than 75 minutes.

We can see that there are no cyclists that finished in time between 55 and 59. We can add the heights of the first three bars to find the number of cyclists who finished the race in less than 75 minutes. 9 + 7 + 5 = 21 We found that 21 cyclists had a time less than 75 minutes.

The amount of calories for a serving of certain foods that Ali eats is displayed on the box plot.

Find the median of the boxplot.

Find the first and third quartiles.

Find the interquartile range and the range.

Are the any outliers? If so, what are they?

We want to find the median of the given box plot. Recall that the median is represented by the vertical segment inside the box.

The vertical segment is at right above 100, so the median is 100.

To identify the quartiles, we look at the beginning and the end of the box. The beginning is the first quartile and the end is the third quartile.

The first point is at 50, so the first quartile is 50. The third quartile looks to be halfway between 150 and 200, so it equals 175.

The interquartile range is the length of the box. We can find it by subtracting the first quartile from the third quartile. IQR = 175 - 50 ⇒ IQR = 125 The interquartile range is 125. Finally, let's find the range of the data set. To do so, we start by identifying the least and the greatest values on the box plot.

The lowest value is 25 and the greatest value is 375. Let's subtract the lowest value from the greatest to find the range. 375 - 25 = 350 Therefore, the range is 350.

Remember that an outlier is a data point that is significantly different from the other values in the data set.

There is one value that lies far apart from the rest of the data points and it is marked as an asterisk. This means that there is an outlier and it equals 375.

Consider the line graph representing the world's tropical rainforests across decades.

Describe the trend in the world's remaining tropical rainforests.

Predict how many millions of acres of tropical rainforests there will be left in

2030 .

Let's analyze the given line graph to describe the trend in the remaining rainforests.

We can see that each year the area of tropical rainforests is smaller. The line graph is slanted downward. At first, the amount of tropical rainforest was decreasing slowly. In the years from 1980 to 2020, the remaining rainforests decreased dramatically.

From Part B, we know that each year the area of tropical rainforests decreases. We can predict that the remaining tropical rainforests will also decrease from 2020 to 2030. Let's extend the line graph to estimate this prediction.

We can predict that in 2030 there will be about 200 million acres of rainforests.