{{ 'ml-label-loading-course' | message }}

{{ tocSubheader }}

{{ 'ml-toc-proceed-mlc' | message }}

{{ 'ml-toc-proceed-tbs' | message }}

An error ocurred, try again later!

Chapter {{ article.chapter.number }}

{{ article.number }}. # {{ article.displayTitle }}

{{ article.intro.summary }}

Show less Show more Lesson Settings & Tools

| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |

| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |

| {{ 'ml-lesson-time-estimation' | message }} |

A data set has little real meaning until it is shown in a data display to visualize trends and patterns. There are lots of data displays that can be used to explore data. This lesson will show how to use histograms and box plots to determine the *frequency distribution* of the data and which measures are used to describe the center and variation. ### Catch-Up and Review

**Here are a few recommended readings before getting started with this lesson.**

Challenge

Tadeo and Ramsha are foodies and love to explore the restaurants that pop up in their neighborhood. They have recorded data about the average price of the main dishes in each restaurant using a table of values.

Average Main Dish Price (Dollars) | ||||
---|---|---|---|---|

$10.12$ | $9.29$ | $8.29$ | $9.78$ | $10.69$ |

$9.68$ | $12.09$ | $8.94$ | $10.81$ | $8.62$ |

$11.39$ | $12.62$ | $8.71$ | $10.74$ | $10.52$ |

$10.77$ | $10.15$ | $9.18$ | $8.45$ | $9.52$ |

$11.89$ | $9.77$ | $9.44$ | $13.24$ | $11.01$ |

$10.62$ | $9.38$ | $12.15$ | $9.68$ | $9.60$ |

$10.32$ | $11.31$ | $11.41$ | $8.62$ | $9.27$ |

$10.96$ | $9.18$ | $10.28$ | $10.71$ | $10.02$ |

They would like to draw some conclusions from this data. However, they are not entirely sure how to proceed with this task. Find the following information to help these curious connoisseurs!

a Consider the following histograms.

{"type":"choice","form":{"alts":["<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">A<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">B<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.73046875em;vertical-align:-0.009765625em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">C<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">D<\/span><\/span><\/span><\/span><\/span>"],"noSort":true},"formTextBefore":"","formTextAfter":"","answer":2}

b Select the option that describes the distribution of the data.

{"type":"choice","form":{"alts":["The data is skewed right.","The data is skewed left.","The data is symmetric."],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":0}

c Which measures of center and variation best describe the data?

{"type":"multichoice","form":{"alts":["Mean","Median","Standard Deviation","Five-Number Summary"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":[1,3]}

Discussion

A frequency distribution, sometimes called a **histogram distribution**, is a representation that displays the number of observations within a given interval. It is used to show the empirical or theoretical frequency of occurrence of each possible value in a data set, often recorded in a frequency table. Frequency distributions of categorical data are typically presented using a bar graph.

In the case of numerical data, the graphical representation of a frequency distribution is called a histogram.

Depending on how a data set is distributed, its histogram can have different shapes. The most common types of distributions are symmetric frequency distribution and skewed frequency distribution.Concept

A symmetric frequency distribution is a distribution in which the data are evenly distributed around the mean and the bars on each side of the middle bar are approximately the same height.

The mean and median are approximately equal in a symmetric frequency distribution. Therefore, the measures of center and spread that best describe a symmetric distribution are the mean and the standard deviation, respectively.Concept

A skewed frequency distribution is a distribution in which the data is not spread evenly — rather, the data is clustered at one end. In this case, the mean and the median are not equal, causing the data set to be *skewed*. A skewed distribution is neither symmetric nor normal. In general, there are two types of skewed frequency distributions.

Skewed Distribution | Description |
---|---|

Skewed Left / Negatively Skewed | The distribution has a long left tail and the median is greater than the mean. |

Skewed Right / Positively Skewed | The distribution has a long right tail and the median is less than the mean. |

The measures of center and spread that best describe a skewed distribution are the median and the five-number summary, respectively. The median is preferred because it is less affected by outliers, while the mean will fall in the direction of the tail of the distribution.

Example

Tadeo and Ramsha are having fun learning about frequency distributions. Last weekend, two of the most exciting cricket games this season took place. Tadeo and Ramsha recorded the runs scored by the $22$ players in each match. The data for Games $1$ and $2$ are shown in the table.

Game $1$ | Game $2$ | ||||||
---|---|---|---|---|---|---|---|

$32$ | $21$ | $27$ | $46$ | $114$ | $87$ | $96$ | $92$ |

$9$ | $16$ | $19$ | $19$ | $101$ | $111$ | $80$ | $106$ |

$40$ | $28$ | $42$ | $36$ | $85$ | $112$ | $117$ | $94$ |

$11$ | $38$ | $23$ | $28$ | $62$ | $43$ | $106$ | $66$ |

$8$ | $18$ | $26$ | $59$ | $104$ | $51$ | $76$ | $91$ |

$62$ | $40$ | $111$ | $78$ |

a Observe the following histograms.

{"type":"choice","form":{"alts":["<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">A<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">B<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.73046875em;vertical-align:-0.009765625em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">C<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">D<\/span><\/span><\/span><\/span><\/span>"],"noSort":true},"formTextBefore":"","formTextAfter":"","answer":0}

b Which of the following statements are true about the data of Game $1?$

{"type":"multichoice","form":{"alts":["The data is symmetric.","The data is skewed right.","The data is skewed left.","The median and the five-number summary best describe the data.","The mean and the standard deviation best describe the data."],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":[1,3]}

c Consider the following histograms.

{"type":"choice","form":{"alts":["<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">A<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">B<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.73046875em;vertical-align:-0.009765625em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">C<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">D<\/span><\/span><\/span><\/span><\/span>"],"noSort":true},"formTextBefore":"","formTextAfter":"","answer":3}

d Which of the following statements are true about the data of Game $2?$

{"type":"multichoice","form":{"alts":["The data is symmetric.","The data is skewed right.","The data is skewed left.","The median and the five-number summary best describe the data.","The mean and the standard deviation best describe the data."],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":[2,3]}

a Begin by making a frequency table of the data.

b Observe where the tail of the distribution extends.

c Make a histogram of the data using a frequency table.

d Which measures of center and variation should be used when the distribution of the data is skewed?

a Notice that the given histograms consist of seven intervals. Considering this information, create a frequency table with the seven intervals, beginning with $0−9,$ to display the data for Game $1.$

Cricket Runs Scored in Game $1$ | |
---|---|

Number of Runs Scored | Frequency |

$0−9$ | $2$ |

$10−19$ | $5$ |

$20−29$ | $6$ |

$30−39$ | $3$ |

$40−49$ | $4$ |

$50−59$ | $1$ |

$60−69$ | $1$ |

Number of Runs Scoredand the vertical axis the

Frequency.Then the bars will be drawn to represent the frequency of each interval.

Note that this corresponds to option

b The tail of the histogram extends to the right and most of the data is on the left. Therefore, the distribution of the data is skewed right. In a skewed distribution, the median and five-number summary best describe the center and variation of the data, respectively. As such, two statements apply to the data set of Game $1.$

- The data is skewed right.
- The median and the five-number summary best describe the data.

c Similar to Part A, in order to identify which histogram is the one that describes the data of Game $2,$ a frequency table of the data will be created, this time using eight intervals.

Cricket Runs Scored in Game $2$ | |
---|---|

Number of Runs Scored | Frequency |

$40−49$ | $1$ |

$50−59$ | $1$ |

$60−69$ | $2$ |

$70−79$ | $2$ |

$80−89$ | $3$ |

$90−99$ | $4$ |

$100−109$ | $4$ |

$110−119$ | $5$ |

Note that this corresponds to option

d In this case, the tail of the histogram extends to the left and most of the data is on the right. Therefore, the data is skewed left. Additionally, the median and five-number summary best describe the center and variation of the data, respectively.

- The data is skewed left.
- The median and the five-number summary best describe the data.

Example

Tadeo and Ramsha are amazed by how data is presented everywhere and how knowing the distribution of the data helps to interpret that data. Besides cricket, they also love watching NFL games and are fans of Peyton Manning, the famous quarterback who retired at age $40.$ Now they want to analyze the retirement ages of NFL players by collecting some data.

Retirement Age of NFL Players | |
---|---|

Age | Frequency |

$25−26$ | $33$ |

$27−28$ | $67$ |

$29−30$ | $93$ |

$31−32$ | $109$ |

$33−34$ | $127$ |

$35−36$ | $114$ |

$37−38$ | $80$ |

$39−40$ | $59$ |

$41−42$ | $43$ |

Based on this frequency table, Tadeo and Ramsha created a histogram and in order to draw some conclusions about the data.

a Consider the following histograms.

{"type":"choice","form":{"alts":["<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">A<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">B<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.73046875em;vertical-align:-0.009765625em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">C<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">D<\/span><\/span><\/span><\/span><\/span>"],"noSort":true},"formTextBefore":"","formTextAfter":"","answer":1}

b Which measures of center and variation best represent the data?

{"type":"multichoice","form":{"alts":["Five-Number Summary","Standard Deviation","Mean","Median","Mode"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":[1,2]}

c Which of the following statements is most likely true about the retirement age of NFL players?

{"type":"choice","form":{"alts":["A typical NFL player is much more likely to retire at around age <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">4<\/span><span class=\"mord\">1<\/span><\/span><\/span><\/span> or <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">4<\/span><span class=\"mord\">2<\/span><span class=\"mord\">.<\/span><\/span><\/span><\/span>","A typical NFL player is much more likely to retire at around age <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">4<\/span><span class=\"mord\">3<\/span><\/span><\/span><\/span> or <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">4<\/span><span class=\"mord\">4<\/span><span class=\"mord\">.<\/span><\/span><\/span><\/span>","A typical NFL player is much more likely to retire at around age <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">2<\/span><span class=\"mord\">3<\/span><\/span><\/span><\/span> or <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">2<\/span><span class=\"mord\">4<\/span><span class=\"mord\">.<\/span><\/span><\/span><\/span>","A typical NFL player is much more likely to retire at around age <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">3<\/span><span class=\"mord\">3<\/span><\/span><\/span><\/span> or <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">3<\/span><span class=\"mord\">4<\/span><span class=\"mord\">.<\/span><\/span><\/span><\/span>"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":3}

a Use the frequency table to display the data in a histogram.

b Which measures of center and variation best describe a symmetric distribution?

c Which interval has the highest bar?

a The data can be displayed in a histogram by using the given frequency table. The vertical axis will be the

Frequencyand the horizontal axis the

Age.Next, bars will be plotted to represent the frequency of data points falling in each interval.

Note that this histogram corresponds to option

b The data on the left side of the distribution is nearly a mirror image of the data on the right side of the distribution, which means that the distribution is symmetric.
In a symmetric distribution, the mean and standard deviation best describe the center and variation of the data.

c In a symmetric frequency distribution, data are distributed evenly around the mean and the bars on each side of the middle bar are about the same height.

It can be seen that the mean of the distribution is in the interval of $33−34.$ This means that a typical NFL player is much more likely to retire at around age $33$ or $34.$

Discussion

There are special distributions that less common than skewed and symmetric distributions. These distributions may appear in situations such as an experiment where each event has the same probability or a sample taken from two separate populations. These are the *uniform* and *bimodal* distributions.

Concept

A uniform frequency distribution, sometimes called a **flat distribution**, is a type of distribution where all the bars are about the same height. This type of distribution arises in scenarios where all the possible outcomes are equally likely. A uniform distribution is also symmetric.

It should be noted that even though each outcome is theoretically equally likely, the frequencies of the outcomes can actually be unequal when collecting data from a real experiment.

Concept

A bimodal distribution is a data distribution with a range of values near two individual values or two intervals, separating the data into two clusters. This causes the histogram of the data to have two *peaks*. The mean and the median of a bimodal distribution are near the center of the distribution.

The given distribution indicates that the sampling was likely made from two different populations. The term *bimodal* refers to the peaks of the distribution, which differs from the mode when intervals are used to make the data display. It is worth mentioning that a bimodal distribution whose bars are about the same height on each side of the peaks is also symmetric.

Consider a histogram that shows the attendance per hour at a local restaurant.

The peaks represent typical lunch and dinner hours. Since the histogram has two distinct peaks, it has a bimodal distribution. Traffic patterns, heights, and test scores are other examples that can show bimodal distribution. Although histograms are mainly used to show bimodality, other representations such as dot plots and leaf plots can also show bimodality.Example

Tadeo and Ramsha want to explore different types of data sets to see what kind of distributions they can draw. The first data set was collected from an experiment by spinning a spinner with six equal sections $1000$ times.

Select the histogram that represents the data set for the spinner experiment. {"type":"choice","form":{"alts":["<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">A<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">B<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.73046875em;vertical-align:-0.009765625em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">C<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">D<\/span><\/span><\/span><\/span><\/span>"],"noSort":true},"formTextBefore":"","formTextAfter":"","answer":1}
Which histogram represents the given data for the exam scores? {"type":"choice","form":{"alts":["<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">A<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">B<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.73046875em;vertical-align:-0.009765625em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">C<\/span><\/span><\/span><\/span><\/span>","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7109375em;vertical-align:0em;\"><\/span><span class=\"mord text\"><span class=\"mord Roboto-Bold textbf\">D<\/span><\/span><\/span><\/span><\/span>"],"noSort":true},"formTextBefore":"","formTextAfter":"","answer":2}
### Hint

### Solution

**B.**
**C**.

Ramsha recorded the results of the experiment in a frequency table.

Color | Frequency |
---|---|

Red | $164$ |

Blue | $168$ |

Yellow | $168$ |

Pink | $166$ |

Green | $165$ |

Orange | $169$ |

Tadeo collected the other data set from a survey about the exam scores of their classmates.

Exam Score | Frequency |
---|---|

$70−71$ | $3$ |

$72−73$ | $8$ |

$74−75$ | $11$ |

$76−77$ | $9$ |

$78−79$ | $4$ |

$80−81$ | $2$ |

$82−83$ | $2$ |

$84−85$ | $4$ |

$86−87$ | $8$ |

$88−89$ | $11$ |

$90−91$ | $9$ |

$92−93$ | $7$ |

$94−95$ | $2$ |

They now have two data sets and would like to display the data in histograms to analyze them.

a Consider the following histograms.

b Select the option(s) that best describes the distribution of the data of the spinner.

{"type":"multichoice","form":{"alts":["The data is symmetric.","The data is skewed.","The data is uniform.","The data is bimodal."],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":[0,2]}

c Consider the following histograms representing test scores.

d Select the option(s) that best describes the distribution of the data about the exam scores.

{"type":"multichoice","form":{"alts":["The data is symmetric.","The data is skewed.","The data is uniform.","The data is bimodal."],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":[0,3]}

a Begin by drawing a histogram of the data.

b How different are the bars of the histogram of the data?

c Use the frequency table to draw a histogram of the data.

d How many

peaksdoes the histogram have? Draw a vertical line through the middle of the distribution.

a The data set of the spinner experiment can be displayed in a histogram. Label the horizontal axis

Colorand the vertical axis

Frequency.Then draw the bars to represent the frequency of each color outcome.

Notice that this corresponds to option

b Looking at the histogram, it can be seen that the bars are all approximately the same height.
This means that the data has a uniform distribution. Moreover, a uniform distribution is also symmetric. Therefore, it can be said that the data follows a symmetric and uniform distribution.

c Similarly to Part A, the data modeling the class's exam scores can be displayed in a histogram. The horizontal axis will be the

Test Scoreand the vertical axis will be the

Frequency.

The histogram of the exam scores has been drawn. The right graph is given in option

d Notice that the histogram has two

peaks.Additionally, these two peaks split data into two clusters. This means that the data follows a bimodal distribution. Moreover, suppose a vertical line is drawn around the halfway line of the distribution. The data on the left is an approximate mirror image of the data on the right.

This means that the distribution is bimodal and symmetric. A possible explanation for this data is that it comes from two groups, one group of students who did not study for the exam (the first peak

on the left) and one group that did study for the exam (the second peak

on the right).

Discussion

A box plot is another data display that allows one to see the shape of a frequency distribution. The length of the whiskers

and the position of the median tell whether the distribution is skewed or symmetric.

**Skewed Left:**The left whisker is longer than the right and the median is closer to the right whisker.**Symmetric Distribution:**The left whisker is about the same length as the right. The plot to the left of the median is an approximate mirror image of the plot on the right.**Skewed Right:**The right whisker is longer than the left and the median is closer to the left whisker.

Keep in mind that to make a box plot, the five-number summary of the data must first be found. This means that box plots can only represent quantitative data.

Example

During their fantastic journey exploring the restaurants in their neighborhood, Tadeo and Ramsha found a fabulous Italian restaurant. While eating their food, they observed the people who entered the restaurant.

The two are curious about the average age of people eating at this restaurant. Therefore, they decide to collect data on the ages of people who enter the restaurant during a typical day.

Ages of People Who Enter the Italian Restaurant on a Typical Day | |||||
---|---|---|---|---|---|

$15$ | $53$ | $55$ | $60$ | $38$ | $56$ |

$62$ | $14$ | $44$ | $24$ | $32$ | $10$ |

$42$ | $54$ | $47$ | $67$ | $60$ | $50$ |

$61$ | $30$ | $30$ | $62$ | $62$ | $65$ |

$56$ | $52$ | $35$ | $25$ | $34$ | $32$ |

They now want to draw some conclusions from this data set by displaying it in a box plot.

a Find the five-number summary and match each description with its corresponding value.

{"type":"pair","form":{"alts":[[{"id":0,"text":"Minimum Value"},{"id":1,"text":"First Quartile"},{"id":2,"text":"Median"},{"id":3,"text":"Third Quartile"},{"id":4,"text":"Maximum Value"}],[{"id":0,"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">1<\/span><span class=\"mord\">0<\/span><\/span><\/span><\/span>"},{"id":1,"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">3<\/span><span class=\"mord\">2<\/span><\/span><\/span><\/span>"},{"id":2,"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">4<\/span><span class=\"mord\">8<\/span><span class=\"mord\">.<\/span><span class=\"mord\">5<\/span><\/span><\/span><\/span>"},{"id":3,"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">6<\/span><span class=\"mord\">0<\/span><\/span><\/span><\/span>"},{"id":4,"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">6<\/span><span class=\"mord\">7<\/span><\/span><\/span><\/span>"}]],"lockLeft":true,"lockRight":false},"formTextBefore":"","formTextAfter":"","answer":[[0,1,2,3,4],[0,1,2,3,4]]}

b Consider the following box plots.

c Which measures of center and variation best represent the data?

{"type":"multichoice","form":{"alts":["Five-Number Summary","Standard Deviation","Mean","Mode"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":[0]}

d Which of the following statements is most likely true about the people who enter the Italian restaurant?

{"type":"choice","form":{"alts":["<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.80556em;vertical-align:-0.05556em;\"><\/span><span class=\"mord\">2<\/span><span class=\"mord\">5<\/span><span class=\"mspace\" style=\"margin-right:0.16666666666666666em;\"><\/span><span class=\"mord\">%<\/span><\/span><\/span><\/span> of the people who enter the Italian restaurant in a regular day are between <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">1<\/span><span class=\"mord\">0<\/span><\/span><\/span><\/span> and <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">4<\/span><span class=\"mord\">8<\/span><\/span><\/span><\/span> years old.","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.80556em;vertical-align:-0.05556em;\"><\/span><span class=\"mord\">2<\/span><span class=\"mord\">5<\/span><span class=\"mspace\" style=\"margin-right:0.16666666666666666em;\"><\/span><span class=\"mord\">%<\/span><\/span><\/span><\/span> of the people who enter the Italian restaurant in a regular day are between <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">4<\/span><span class=\"mord\">8<\/span><\/span><\/span><\/span> and <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">6<\/span><span class=\"mord\">7<\/span><\/span><\/span><\/span> years old.","<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.80556em;vertical-align:-0.05556em;\"><\/span><span class=\"mord\">5<\/span><span class=\"mord\">0<\/span><span class=\"mspace\" style=\"margin-right:0.16666666666666666em;\"><\/span><span class=\"mord\">%<\/span><\/span><\/span><\/span> of the people who enter the Italian restaurant in a regular day are between <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">3<\/span><span class=\"mord\">2<\/span><\/span><\/span><\/span> and <span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.64444em;vertical-align:0em;\"><\/span><span class=\"mord\">6<\/span><span class=\"mord\">0<\/span><\/span><\/span><\/span> years old."],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":2}

a Begin by ordering the data values from least to greatest.

b Use the five-number summary to make the box plot.

c Decide whether the data is skewed or symmetric.

d Each whisker represents $25%$ of the data. The box represents $50%$ of the data.

a To find the five-number summary of the data set, begin by ordering the data values from least to greatest.

$101415242530303232343538424447505253545556566060616262626567 $

Notice that the minimum value is $10$ and the maximum value is $67.$ Additionally, the number of data points is $30,$ an even number. Therefore, the median of the data set will be given by the mean of the middle numbers $47$ and $50.$
$Median:247+50 =48.5 $

Finally, the first quartile, or the median of the lower half, is $32,$ and the third quartile is $60.$ The following table summarizes this information. Each description is matched with its corresponding value. Five Number Summary | |
---|---|

Minimum Value | $10$ |

First Quartile | $32$ |

Median | $48.5$ |

Third Quartile | $60$ |

Maximum Value | $67$ |

b The five-number summary found in Part A can be used to draw the box plot of the data set. Start by drawing a number line that includes the minimum and maximum values of the data. Next, graph points above the number line for the five-number summary.

Now, draw a box from the $first$ quartile to the $third$ quartile. Then, draw a line through the median and the whiskers from the box to the minimum and maximum values.

Notice that this plot corresponds to option **A**.

c In the box-plot drawn in the previous part, notice that the left whisker is longer than the right and that the median is closer to the right whisker than it is to the left.

This means that the data is skewed left. In a skewed distribution, the five-number summary best describes the center and spread of the data.

d In a box plot, each whisker represents $25%$ of the data and the box represents the middle $50%$ of the data. With this in mind, the following facts about the distribution of the data set can be determined.

- $25%$ of the people who enter the Italian restaurant in a regular day are between $10$ and $32$ years old.
- $50%$ of the people who enter the Italian restaurant in a regular day are between $32$ and $60$ years old.
- $25%$ of the people who enter the Italian restaurant in a regular day are between $60$ and $67$ years old.

Therefore, the option that states that $50%$ of the people who enter the Italian restaurant in a regular day are between $32$ and $60$ years old is the right one.

Example

In addition to exploring restaurants, Tadeo and Ramsha spend a lot of time playing video games together. While hanging out weekend, they discussed whether boys or girls spend more time on video games on weekends. To investigate this situation, the two collected some data from their classmates at North High School and displayed it in a double-histogram.
### Hint

### Solution

a Select the statements that are right about data set of the responses from the girls.

{"type":"multichoice","form":{"alts":["The data is symmetric.","The mean and standard deviation best describe the data.","The data is skewed left.","The data is skewed right.","The five-number summary best describe the data."],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":[0,1]}

b Which of the following statements are true about the data set of responses from the boys?

{"type":"multichoice","form":{"alts":["The data is symmetric.","The mean and standard deviation best describe the data.","The data is skewed left.","The data is skewed right.","The five-number summary best describe the data."],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":[2,4]}

c If one student is randomly selected from each group, which is more likely to spend more time on video games on weekends?

{"type":"choice","form":{"alts":["The boy","The girl"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":0}

a Begin by identifying the distribution shape of the data.

b Identify the shape of the distribution.

c Use the distribution of each data set to identify a typical value.

a The shape of the distribution will be described to determine which of the given sentences are right about the data set showing how much time the girls spend playing video games.

Notice that the left side of the distribution is almost a mirror image of the data on the right side of the distribution. This means that the distribution is symmetric and that the mean and standard deviation best describe the center and variation of the data.

- The distribution is symmetric.
- The mean and standard deviation best describe the center and variation of the data.

This gives the two correct statements about the girls' data set.

b Now, consider the histogram of the data set that represents the responses given by the boys.

Notice that in this case, the tail of the distribution extends to the left and that most of the data is on the right side of the histogram. Therefore, the distribution of the data skewed left, so the five-number summary best describes the center and spread of the data.

- The distribution is skewed left.
- The five-number summary best describes the center and spread of the data.

c It was previously identified that the mean best describes the center of the data set for the girls and the median for the data set for the boys.

Appropriate Measures of Center | |
---|---|

Girls' Data Set | Boys' Data Set |

Mean | Median |

To identify who is more likely to spend more time on video games, compare these measures of center. Since the girls' distribution is symmetric, its mean is probably in the $6.1-8$ interval, the center of the distribution.

Conversely, because in a skewed distribution, the median is righter to the center and closer to the peaks of the distribution, it is probably in the $8.1−10$ interval.

Comparing these values, notice that the median of the boys is greater than the mean of the girls.$Girls’Mean 6.1-8 Boys’Median 8.1-10 $

This means it is more likely that a boy spends more time playing video games on the weekends than a girl does.
Example

Ramsha and Tadeo took a survey of their classmates about the number of hours spent playing video games on the weekends. These results made Ramsha more interested in finding other everyday situations with remarkable differences due to gender. She decided to search the web for similar examples.

During her investigation, Ramsha found a peculiar table showing the results of a survey comparing the amount of money that men and women usually spend on clothes per month.

Women | Men | |
---|---|---|

Survey Size | $100$ | $100$ |

Minimum | $$18$ | $$8$ |

Maximum | $$60$ | $$28$ |

$1_{st}$ Quartile | $$30$ | $$14$ |

Median | $$34$ | $$18$ |

$3_{rd}$ Quartile | $$40$ | $$22$ |

Mean | $$36$ | $$18$ |

Standard Deviation | $$8$ | $$4$ |

a Consider the following double box plots.

b Which statement is true about the given data?

{"type":"choice","form":{"alts":["The median for men is twice the median for women. There is more variability in the amount of money spent by men.","The median for women is almost twice the mean for men. There is more variability in the amount of money spent by women.","The median for women equals the mean for men. The data sets have the same variability."],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":1}

c How many of the women surveyed are expected to spend between $$30$ and $$40$ on clothes per month?

{"type":"text","form":{"type":"math","options":{"comparison":"1","nofractofloat":true,"keypad":{"simple":true,"useShortLog":false,"variables":[],"constants":[]}},"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><\/span><\/span>"},"formTextBefore":null,"formTextAfter":"women","answer":{"text":["50"]}}

a Use the five-number summary of each data set to make a double box plot.

b Begin by identifying each data set distribution.

c Each whisker represents $25%$ of the data. The box represents $50%$ of the data.

a This data can be represented with a double box plot to identify which of the given graphs accurately represents the data set. First, draw a number linethat includes the minimum and maximum values of each gender's data set. Next, plot points above the number line for the given values of the five-number summary.

Next, draw the box for each plot using the first and third quartiles. Finally, draw a line through the median and the whiskers from the box to the minimum and maximum values of each data set.

Notice that this corresponds to the box-plot in option **D**.

b In order to identify which of the given statements is correct, compare the center and spread of the data sets. Note that for the women's data set, the right whisker is longer than the left one and that the median is closer to the left whisker. This means that the data is skewed right, and the median best describes the center of the data.

$Median of Women’s Data Set: $34 $

Conversely, for the men's data set, the whiskers are approximately equal and the median falls in the middle of the box. Therefore, this data is modeled by a symmetric distribution, and the mean best describes the center of the data. $Mean of Men’s Data Set: $18 $

Notice that the median amount of money spent by women on clothes each month is almost twice the mean amount of money spent on clothes by men. Recall that the range of a data set is given by the difference of the minimum and maximum values. Using this information, compare the range and standard deviation of the data sets. Standard Deviation | Interquartile Range | |
---|---|---|

Women | $$8$ | $60−18=$42$ |

Men | $$4$ | $28−8=$20$ |

Both the standard deviation and the interquartile range are greater for women. This means that there is more variability in the amount of money spent by women.

c To calculate how many of the women surveyed are expected to spend between $$30$ and $$40$ on clothes per month, consider that structure of a box plot. Each whisker represents $25%$ of the data, and the box represents the middle $50%.$ With this information in mind, the following statements are true.

- $25%$ of the women surveyed are expected to spend between $$18$ and $$30.$
- $50%$ of the women surveyed are expected to spend between $$30$ and $$40.$
- $25%$ of the women surveyed are expected to spend between $$40$ and $$60.$

$100⋅0.5=50 $

Therefore, $50$ out of the $100$ women surveyed are expected to spend between $$30$ and $$40$ on clothes per month.
Closure

All the pieces to analyzing data using histograms have been covered. This method of displaying data makes it easier to find the data distribution and determine the best measures of center and variation to describe the data set. Recall the data Tadeo and Ramsha recorded about the main dishes of the restaurants in their neighborhood at the beginning of the lesson.

Average Main Dish Price (Dollars) | ||||
---|---|---|---|---|

$10.12$ | $9.29$ | $8.29$ | $9.78$ | $10.69$ |

$9.68$ | $12.09$ | $8.94$ | $10.81$ | $8.62$ |

$11.39$ | $12.62$ | $8.71$ | $10.74$ | $10.52$ |

$10.77$ | $10.15$ | $9.18$ | $8.45$ | $9.52$ |

$11.89$ | $9.77$ | $9.44$ | $13.24$ | $11.01$ |

$10.62$ | $9.38$ | $12.15$ | $9.68$ | $9.60$ |

$10.32$ | $11.31$ | $11.41$ | $8.62$ | $9.27$ |

$10.96$ | $9.18$ | $10.28$ | $10.71$ | $10.02$ |

The two students wanted to draw some insights and conclusions based on this data. However, they were not entirely sure how to proceed with this task. Find the following information to help these curious connoisseurs!

a Consider the following histograms.

b Select the option that describes the distribution of the data.

c Which measures of center and variation will best describe the data?

{"type":"multichoice","form":{"alts":["Mean","Median","Standard deviation","Five-number summary"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":[1,3]}

a Begin by making a frequency table of the data.

b Is there a tail to the distribution? Which way does it extend?

c Which measures of center and variation should be used when the distribution of the data is skewed?

a Note that the given histograms consist of six intervals. With this in mind, make a frequency table using six intervals, starting with $8.00−9.99.$

Average Main Dish Price (Dollars) | |
---|---|

Price Range | Frequency |

$8.00−8.99$ | $6$ |

$9.00−9.99$ | $12$ |

$10.00−10.99$ | $13$ |

$11.00−11.99$ | $5$ |

$12.00−12.99$ | $3$ |

$13.00−13.99$ | $1$ |

Price Rangeand the vertical axis the

Frequency.Then, draw the bars to represent the frequency of each interval.

Notice that this corresponds to option

b It can be seen in the histogram that the tail of the distribution extends to the right and that most of the data is on the left.

Therefore, the distribution is skewed right.

c Consider the appropriate measures of center and variation for a skewed and a symmetric distribution.

Distribution | Measure of Center | Measure of Variation |
---|---|---|

Symmetric | Mean | Standard deviation |

Skewed | Median | Five-number summary |

Because in this situation the distribution of the data is skewed, the median and the five-number summary best describe the center and variation of the data, respectively.

Loading content