{{ toc.signature }}
{{ toc.name }}
{{ stepNode.name }}
Proceed to next lesson
An error ocurred, try again later!
Chapter {{ article.chapter.number }}
{{ article.number }}.

# {{ article.displayTitle }}

{{ article.introSlideInfo.summary }}
{{ 'ml-btn-show-less' | message }} {{ 'ml-btn-show-more' | message }} expand_more
##### {{ 'ml-heading-abilities-covered' | message }}
{{ ability.description }}

#### {{ 'ml-heading-lesson-settings' | message }}

{{ 'ml-lesson-show-solutions' | message }}
{{ 'ml-lesson-show-hints' | message }}
 {{ 'ml-lesson-number-slides' | message : article.introSlideInfo.bblockCount}} {{ 'ml-lesson-number-exercises' | message : article.introSlideInfo.exerciseCount}} {{ 'ml-lesson-time-estimation' | message }}
A data set has little real meaning until it is shown in a data display to visualize trends and patterns. There are lots of data displays that can be used to explore data. This lesson will show how to use histograms and box plots to determine the frequency distribution of the data and which measures are used to describe the center and variation.

### Catch-Up and Review

Here are a few recommended readings before getting started with this lesson.

## Giving Meaning to a Data Set

Tadeo and Ramsha are foodies and love to explore the restaurants that pop up in their neighborhood. They have recorded data about the average price of the main dishes in each restaurant using a table of values.

Average Main Dish Price (Dollars)

They would like to draw some conclusions from this data. However, they are not entirely sure how to proceed with this task. Find the following information to help these curious connoisseurs!

a Consider the following histograms. Which histogram best describes the data?
b Select the option that describes the distribution of the data.
c Which measures of center and variation best describe the data?

## Frequency Distribution

A frequency distribution, sometimes called a histogram distribution, is a representation that displays the number of observations within a given interval. It is used to show the empirical or theoretical frequency of occurrence of each possible value in a data set, often recorded in a frequency table. Frequency distributions of categorical data are typically presented using a bar graph. In the case of numerical data, the graphical representation of a frequency distribution is called a histogram. Depending on how a data set is distributed, its histogram can have different shapes. The most common types of distributions are symmetric frequency distribution and skewed frequency distribution.

### Symmetric Frequency Distribution

In a symmetric frequency distribution, data are distributed evenly around the mean and the bars on each side of the middle bar are about the same height. Additionally, the mean and median are approximately equal to each other in this type of frequency distribution.

### Skewed Frequency Distribution

Not all data sets have a symmetric frequency distribution. If the mean and median are not equal, then the data set is skewed. In general, there are two types of skewed frequency distributions.

Skewed Distribution Description
Skewed Left / Negatively Skewed The distribution has a long left tail and the median is greater than the mean.
Skewed Right / Positively Skewed The distribution has a long right tail and the median is less than the mean.
The difference between these three basic types of frequency distributions can be visualized in the following applet. ### Distributions and Measures of Center and Variation

The measures of center and variation that best describe a given data set can be known in advance by looking at the shape of its distribution.

• Symmetric Distribution: In this type of distribution, the mean and the standard deviation will best describe the center and variation of the data, respectively.
• Skew Distribution: In this case, use the median to describe the center and the five-number summary to describe the spread of the data.
This comes from the fact that the mean and the median are about the same in a symmetric distribution. Moreover, in a skew distribution, the median is preferred because it is less affected by outliers, while the mean will fall in the direction of the tail of the distribution.

## Finding the Distribution of Runs Scored in Cricket

Tadeo and Ramsha are having fun learning about frequency distributions. Last weekend, two of the most exciting cricket games this season took place. Tadeo and Ramsha recorded the runs scored by the players in each match. The data for Games and are shown in the table.

Game Game
a Observe the following histograms. Which histogram represents the data of Game
b Which of the following statements are true about the data of Game
c Consider the following histograms. Which histogram represents the data of Game
d Which of the following statements are true about the data of Game

### Hint

a Begin by making a frequency table of the data.
b Observe where the tail of the distribution extends.
c Make a histogram of the data using a frequency table.
d Which measures of center and variation should be used when the distribution of the data is skewed?

### Solution

a Notice that the given histograms consist of seven intervals. Considering this information, create a frequency table with the seven intervals, beginning with to display the data for Game
Cricket Runs Scored in Game
Number of Runs Scored Frequency
The data can be displayed in a histogram by using this frequency table. The horizontal axis will be the Number of Runs Scored and the vertical axis the Frequency. Then the bars will be drawn to represent the frequency of each interval. Note that this corresponds to option A.
b The tail of the histogram extends to the right and most of the data is on the left. Therefore, the distribution of the data is skewed right. In a skewed distribution, the median and five-number summary best describe the center and variation of the data, respectively. As such, two statements apply to the data set of Game
• The data is skewed right.
• The median and the five-number summary best describe the data.
c Similar to Part A, in order to identify which histogram is the one that describes the data of Game a frequency table of the data will be created, this time using eight intervals.
Cricket Runs Scored in Game
Number of Runs Scored Frequency
Using this table, the histogram of the data can be now created. Note that this corresponds to option D.
d In this case, the tail of the histogram extends to the left and most of the data is on the right. Therefore, the data is skewed left. Additionally, the median and five-number summary best describe the center and variation of the data, respectively.
• The data is skewed left.
• The median and the five-number summary best describe the data.

## Analyzing the Retirement Ages of NFL Players

Tadeo and Ramsha are amazed by how data is presented everywhere and how knowing the distribution of the data helps to interpret that data. Besides cricket, they also love watching NFL games and are fans of Peyton Manning, the famous quarterback who retired at age Now they want to analyze the retirement ages of NFL players by collecting some data.

Retirement Age of NFL Players
Age Frequency

Based on this frequency table, Tadeo and Ramsha created a histogram and in order to draw some conclusions about the data.

a Consider the following histograms. Which of these histograms could represent the given data set?
b Which measures of center and variation best represent the data?
c Which of the following statements is most likely true about the retirement age of NFL players?

### Hint

a Use the frequency table to display the data in a histogram.
b Which measures of center and variation best describe a symmetric distribution?
c Which interval has the highest bar?

### Solution

a The data can be displayed in a histogram by using the given frequency table. The vertical axis will be the Frequency and the horizontal axis the Age. Next, bars will be plotted to represent the frequency of data points falling in each interval. Note that this histogram corresponds to option B.
b The data on the left side of the distribution is nearly a mirror image of the data on the right side of the distribution, which means that the distribution is symmetric. In a symmetric distribution, the mean and standard deviation best describe the center and variation of the data.
c In a symmetric frequency distribution, data are distributed evenly around the mean and the bars on each side of the middle bar are about the same height. It can be seen that the mean of the distribution is in the interval of This means that a typical NFL player is much more likely to retire at around age or

## Uniform and Bimodal Distributions

There are special distributions that less common than skewed and symmetric distributions. These distributions may appear in situations such as an experiment where each event has the same probability or a sample taken from two separate populations. These are the uniform and bimodal distributions.

## Uniform Distribution

A histogram is said to have a uniform distribution when its bars are all about the same height. This type of distribution arises in scenarios where all the possible outcomes are equally likely. A uniform distribution is also symmetric. This distribution is also sometimes called flat distribution.

### Example

The outcomes of rolling a fair six-sided die are and and they have all equal probabilities of occurring. A six-sided die is thrown times and the frequency of each outcome is written down. The distribution of the frequency of each outcome is shown in the following histogram. It can be noted that even though each outcome is mathematically equally likely, the frequencies of the outcomes can actually be unequal when collecting data from a real scenario.

## Bimodal Distribution

A bimodal distribution is a data distribution with a range of values near two individual values or two intervals separating the data into two clusters. This will cause the histogram of the data to have two peaks. Additionally, the mean and the median are near the center of the distribution. The given distribution indicates that the sampling was likely made from two different populations. The term bimodal refers to the peaks of the distribution, which differs from the mode when intervals are used to make the data display. It is worth mentioning that a bimodal distribution whose bars are about the same height on each side of the peaks is also symmetric.

#### Example

Consider a histogram that shows the attendance per hour of a local restaurant. The peaks represent typical lunch and dinner hours. Since the histogram has two distinct peaks, it has a bimodal distribution. Salaries, heights, and test scores are other examples of bimodal distributions. Although histograms are mainly used to show bimodality, other representations such as dot plots and leaf plots can also show bimodality.

## Frequency Distributions of Data From Different Sources

Tadeo and Ramsha want to explore different types of data sets to see what kind of distributions they can draw. The first data set was collected from an experiment by spinning a spinner with six equal sections times. Ramsha recorded the results of the experiment in a frequency table.
Color Frequency
Red
Blue
Yellow
Pink
Green
Orange

Tadeo collected the other data set from a survey about the exam scores of their classmates.

Exam Score Frequency

They now have two data sets and would like to display the data in histograms to analyze them.

a Consider the following histograms. Select the histogram that represents the data set for the spinner experiment.
b Select the option(s) that best describes the distribution of the data of the spinner.
c Consider the following histograms representing test scores. Which histogram represents the given data for the exam scores?
d Select the option(s) that best describes the distribution of the data about the exam scores.

### Hint

a Begin by drawing a histogram of the data.
b How different are the bars of the histogram of the data?
c Use the frequency table to draw a histogram of the data.
d How many peaks does the histogram have? Draw a vertical line through the middle of the distribution.

### Solution

a The data set of the spinner experiment can be displayed in a histogram. Label the horizontal axis Color and the vertical axis Frequency. Then draw the bars to represent the frequency of each color outcome. Notice that this corresponds to option B.
b Looking at the histogram, it can be seen that the bars are all approximately the same height. This means that the data has a uniform distribution. Moreover, a uniform distribution is also symmetric. Therefore, it can be said that the data follows a symmetric and uniform distribution.
c Similarly to Part A, the data modeling the class's exam scores can be displayed in a histogram. The horizontal axis will be the Test Score and the vertical axis will be the Frequency. The histogram of the exam scores has been drawn. The right graph is given in option C.
d Notice that the histogram has two peaks. Additionally, these two peaks split data into two clusters. This means that the data follows a bimodal distribution. Moreover, suppose a vertical line is drawn around the halfway line of the distribution. The data on the left is an approximate mirror image of the data on the right. This means that the distribution is bimodal and symmetric. A possible explanation for this data is that it comes from two groups, one group of students who did not study for the exam (the first peak on the left) and one group that did study for the exam (the second peak on the right).

## Box-and-Whisker Plots as Distributions

A box plot is another data display that allows one to see the shape of a frequency distribution. The length of the whiskers and the position of the median tell whether the distribution is skewed or symmetric.

• Skewed Left: The left whisker is longer than the right and the median is closer to the right whisker.
• Symmetric Distribution: The left whisker is about the same length as the right. The plot to the left of the median is an approximate mirror image of the plot on the right.
• Skewed Right: The right whisker is longer than the left and the median is closer to the left whisker.
The following applet shows each of these three frequency distributions using box plots. Keep in mind that to make a box plot, the five-number summary of the data must first be found. This means that box plots can only represent quantitative data.

## Analyzing the Ages of Customers at an Italian Restaurant

During their fantastic journey exploring the restaurants in their neighborhood, Tadeo and Ramsha found a fabulous Italian restaurant. While eating their food, they observed the people who entered the restaurant.

The two are curious about the average age of people eating at this restaurant. Therefore, they decide to collect data on the ages of people who enter the restaurant during a typical day.

Ages of People Who Enter the Italian Restaurant on a Typical Day

They now want to draw some conclusions from this data set by displaying it in a box plot.

a Find the five-number summary and match each description with its corresponding value.
b Consider the following box plots. Which of the box plots represents the given data set?
c Which measures of center and variation best represent the data?
d Which of the following statements is most likely true about the people who enter the Italian restaurant?

### Hint

a Begin by ordering the data values from least to greatest.
b Use the five-number summary to make the box plot.
c Decide whether the data is skewed or symmetric.
d Each whisker represents of the data. The box represents of the data.

### Solution

a To find the five-number summary of the data set, begin by ordering the data values from least to greatest.
Notice that the minimum value is and the maximum value is Additionally, the number of data points is an even number. Therefore, the median of the data set will be given by the mean of the middle numbers and
Finally, the first quartile, or the median of the lower half, is and the third quartile is The following table summarizes this information. Each description is matched with its corresponding value.
Five Number Summary
Minimum Value
First Quartile
Median
Third Quartile
Maximum Value
b The five-number summary found in Part A can be used to draw the box plot of the data set. Start by drawing a number line that includes the minimum and maximum values of the data. Next, graph points above the number line for the five-number summary. Now, draw a box from the quartile to the quartile. Then, draw a line through the median and the whiskers from the box to the minimum and maximum values. Notice that this plot corresponds to option A.

c In the box-plot drawn in the previous part, notice that the left whisker is longer than the right and that the median is closer to the right whisker than it is to the left. This means that the data is skewed left. In a skewed distribution, the five-number summary best describes the center and spread of the data.
d In a box plot, each whisker represents of the data and the box represents the middle of the data. With this in mind, the following facts about the distribution of the data set can be determined.
• of the people who enter the Italian restaurant in a regular day are between and years old.
• of the people who enter the Italian restaurant in a regular day are between and years old.
• of the people who enter the Italian restaurant in a regular day are between and years old.

Therefore, the option that states that of the people who enter the Italian restaurant in a regular day are between and years old is the right one.

## Comparing Data From Two Groups

In addition to exploring restaurants, Tadeo and Ramsha spend a lot of time playing video games together. While hanging out weekend, they discussed whether boys or girls spend more time on video games on weekends. To investigate this situation, the two collected some data from their classmates at North High School and displayed it in a double-histogram. a Select the statements that are right about data set of the responses from the girls.
b Which of the following statements are true about the data set of responses from the boys?
c If one student is randomly selected from each group, which is more likely to spend more time on video games on weekends?

### Hint

a Begin by identifying the distribution shape of the data.
b Identify the shape of the distribution.
c Use the distribution of each data set to identify a typical value.

### Solution

a The shape of the distribution will be described to determine which of the given sentences are right about the data set showing how much time the girls spend playing video games. Notice that the left side of the distribution is almost a mirror image of the data on the right side of the distribution. This means that the distribution is symmetric and that the mean and standard deviation best describe the center and variation of the data.
• The distribution is symmetric.
• The mean and standard deviation best describe the center and variation of the data.

This gives the two correct statements about the girls' data set.

b Now, consider the histogram of the data set that represents the responses given by the boys. Notice that in this case, the tail of the distribution extends to the left and that most of the data is on the right side of the histogram. Therefore, the distribution of the data skewed left, so the five-number summary best describes the center and spread of the data.
• The distribution is skewed left.
• The five-number summary best describes the center and spread of the data.
c It was previously identified that the mean best describes the center of the data set for the girls and the median for the data set for the boys.
Appropriate Measures of Center
Girls' Data Set Boys' Data Set
Mean Median

To identify who is more likely to spend more time on video games, compare these measures of center. Since the girls' distribution is symmetric, its mean is probably in the interval, the center of the distribution. Conversely, because in a skewed distribution, the median is righter to the center and closer to the peaks of the distribution, it is probably in the interval. Comparing these values, notice that the median of the boys is greater than the mean of the girls.
This means it is more likely that a boy spends more time playing video games on the weekends than a girl does.

## Using Box Plots to Interpret Results From a Survey

Ramsha and Tadeo took a survey of their classmates about the number of hours spent playing video games on the weekends. These results made Ramsha more interested in finding other everyday situations with remarkable differences due to gender. She decided to search the web for similar examples. During her investigation, Ramsha found a peculiar table showing the results of a survey comparing the amount of money that men and women usually spend on clothes per month.

Women Men
Survey Size
Minimum
Maximum
Quartile
Median
Quartile
Mean
Standard Deviation
a Consider the following double box plots.  Which of the given double box plots shows the results of the survey Ramsha found online?
b Which statement is true about the given data?
c How many of the women surveyed are expected to spend between and on clothes per month?

### Hint

a Use the five-number summary of each data set to make a double box plot.
b Begin by identifying each data set distribution.
c Each whisker represents of the data. The box represents of the data.

### Solution

a This data can be represented with a double box plot to identify which of the given graphs accurately represents the data set. First, draw a number linethat includes the minimum and maximum values of each gender's data set. Next, plot points above the number line for the given values of the five-number summary. Next, draw the box for each plot using the first and third quartiles. Finally, draw a line through the median and the whiskers from the box to the minimum and maximum values of each data set. Notice that this corresponds to the box-plot in option D.

b In order to identify which of the given statements is correct, compare the center and spread of the data sets. Note that for the women's data set, the right whisker is longer than the left one and that the median is closer to the left whisker. This means that the data is skewed right, and the median best describes the center of the data.
Conversely, for the men's data set, the whiskers are approximately equal and the median falls in the middle of the box. Therefore, this data is modeled by a symmetric distribution, and the mean best describes the center of the data.
Notice that the median amount of money spent by women on clothes each month is almost twice the mean amount of money spent on clothes by men. Recall that the range of a data set is given by the difference of the minimum and maximum values. Using this information, compare the range and standard deviation of the data sets.
Standard Deviation Interquartile Range
Women
Men

Both the standard deviation and the interquartile range are greater for women. This means that there is more variability in the amount of money spent by women.

c To calculate how many of the women surveyed are expected to spend between and on clothes per month, consider that structure of a box plot. Each whisker represents of the data, and the box represents the middle With this information in mind, the following statements are true.
• of the women surveyed are expected to spend between and
• of the women surveyed are expected to spend between and
• of the women surveyed are expected to spend between and
This means that the of the survey size needs to be calculated in order to determine the number of women who are expected to spend between and on clothes. Recall that women participated in the survey.
Therefore, out of the women surveyed are expected to spend between and on clothes per month.

## Finding Insights and Drawing Conclusions From Histograms

All the pieces to analyzing data using histograms have been covered. This method of displaying data makes it easier to find the data distribution and determine the best measures of center and variation to describe the data set. Recall the data Tadeo and Ramsha recorded about the main dishes of the restaurants in their neighborhood at the beginning of the lesson.

Average Main Dish Price (Dollars)

The two students wanted to draw some insights and conclusions based on this data. However, they were not entirely sure how to proceed with this task. Find the following information to help these curious connoisseurs!

a Consider the following histograms. Which histogram best describes the data?
b Select the option that describes the distribution of the data.
c Which measures of center and variation will best describe the data?

### Hint

a Begin by making a frequency table of the data.
b Is there a tail to the distribution? Which way does it extend?
c Which measures of center and variation should be used when the distribution of the data is skewed?

### Solution

a Note that the given histograms consist of six intervals. With this in mind, make a frequency table using six intervals, starting with
Average Main Dish Price (Dollars)
Price Range Frequency
Next, use this frequency table to display the data in a histogram. The horizontal axis will be the Price Range and the vertical axis the Frequency. Then, draw the bars to represent the frequency of each interval. Notice that this corresponds to option C.
b It can be seen in the histogram that the tail of the distribution extends to the right and that most of the data is on the left. Therefore, the distribution is skewed right.
c Consider the appropriate measures of center and variation for a skewed and a symmetric distribution.
Distribution Measure of Center Measure of Variation
Symmetric Mean Standard deviation
Skewed Median Five-number summary

Because in this situation the distribution of the data is skewed, the median and the five-number summary best describe the center and variation of the data, respectively.

{{ subexercise.title }}