{{ toc.name }}
{{ toc.signature }}
{{ toc.name }} {{ 'ml-btn-view-details' | message }}
{{ stepNode.name }}
{{ 'ml-toc-proceed' | message }}
Lesson
Exercises
Recommended
Tests
An error ocurred, try again later!
Chapter {{ article.chapter.number }}
{{ article.number }}. 

{{ article.displayTitle }}

{{ article.intro.summary }}
{{ 'ml-btn-show-less' | message }} {{ 'ml-btn-show-more' | message }} expand_more
{{ 'ml-heading-abilities-covered' | message }}
{{ ability.description }} {{ ability.displayTitle }}
{{ 'ml-heading-lesson-settings' | message }}
{{ 'ml-lesson-show-solutions' | message }}
{{ 'ml-lesson-show-hints' | message }}
{{ 'ml-lesson-number-slides' | message : article.intro.bblockCount}}
{{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount}}
{{ 'ml-lesson-time-estimation' | message }}
A data set has little real meaning until it is shown in a data display to visualize trends and patterns. There are lots of data displays that can be used to explore data. This lesson will show how to use histograms and box plots to determine the frequency distribution of the data and which measures are used to describe the center and variation.

Catch-Up and Review

Here are a few recommended readings before getting started with this lesson.


Challenge

Giving Meaning to a Data Set

Tadeo and Ramsha are foodies and love to explore the restaurants that pop up in their neighborhood. They have recorded data about the average price of the main dishes in each restaurant using a table of values.

Average Main Dish Price (Dollars)

They would like to draw some conclusions from this data. However, they are not entirely sure how to proceed with this task. Find the following information to help these curious connoisseurs!

a Consider the following histograms.
Four possible histograms. Only one correctly represents the data.
Which histogram best describes the data?
b Select the option that describes the distribution of the data.
c Which measures of center and variation best describe the data?

Example

Finding the Distribution of Runs Scored in Cricket

Tadeo and Ramsha are having fun learning about frequency distributions. Last weekend, two of the most exciting cricket games this season took place. Tadeo and Ramsha recorded the runs scored by the players in each match. The data for Games and are shown in the table.

Game Game
a Observe the following histograms.
Four possible histograms. Only one correctly represents the data.
Which histogram represents the data of Game
b Which of the following statements are true about the data of Game
c Consider the following histograms.
Four possible histograms. Only one correctly represents the data.
Which histogram represents the data of Game
d Which of the following statements are true about the data of Game

Hint

a Begin by making a frequency table of the data.
b Observe where the tail of the distribution extends.
c Make a histogram of the data using a frequency table.
d Which measures of center and variation should be used when the distribution of the data is skewed?

Solution

a Notice that the given histograms consist of seven intervals. Considering this information, create a frequency table with the seven intervals, beginning with to display the data for Game
Cricket Runs Scored in Game
Number of Runs Scored Frequency
The data can be displayed in a histogram by using this frequency table. The horizontal axis will be the Number of Runs Scored and the vertical axis the Frequency. Then the bars will be drawn to represent the frequency of each interval.
Histogram of the data of Game 1
Note that this corresponds to option A.
b The tail of the histogram extends to the right and most of the data is on the left. Therefore, the distribution of the data is skewed right. In a skewed distribution, the median and five-number summary best describe the center and variation of the data, respectively. As such, two statements apply to the data set of Game
  • The data is skewed right.
  • The median and the five-number summary best describe the data.
c Similar to Part A, in order to identify which histogram is the one that describes the data of Game a frequency table of the data will be created, this time using eight intervals.
Cricket Runs Scored in Game
Number of Runs Scored Frequency
Using this table, the histogram of the data can be now created.
Histogram of the data of Game 2
Note that this corresponds to option D.
d In this case, the tail of the histogram extends to the left and most of the data is on the right. Therefore, the data is skewed left. Additionally, the median and five-number summary best describe the center and variation of the data, respectively.
  • The data is skewed left.
  • The median and the five-number summary best describe the data.

Discussion

Uniform and Bimodal Distributions

There are special distributions that less common than skewed and symmetric distributions. These distributions may appear in situations such as an experiment where each event has the same probability or a sample taken from two separate populations. These are the uniform and bimodal distributions.

Discussion

Box-and-Whisker Plots as Distributions

A box plot is another data display that allows one to see the shape of a frequency distribution. The length of the whiskers and the position of the median tell whether the distribution is skewed or symmetric.

  • Skewed Left: The left whisker is longer than the right and the median is closer to the right whisker.
  • Symmetric Distribution: The left whisker is about the same length as the right. The plot to the left of the median is an approximate mirror image of the plot on the right.
  • Skewed Right: The right whisker is longer than the left and the median is closer to the left whisker.
The following applet shows each of these three frequency distributions using box plots.
normal and skewed distribution using box plots
Keep in mind that to make a box plot, the five-number summary of the data must first be found. This means that box plots can only represent quantitative data.

Example

Analyzing the Ages of Customers at an Italian Restaurant

During their fantastic journey exploring the restaurants in their neighborhood, Tadeo and Ramsha found a fabulous Italian restaurant. While eating their food, they observed the people who entered the restaurant.

Pizza-place.jpg

The two are curious about the average age of people eating at this restaurant. Therefore, they decide to collect data on the ages of people who enter the restaurant during a typical day.

Ages of People Who Enter the Italian Restaurant on a Typical Day

They now want to draw some conclusions from this data set by displaying it in a box plot.

a Find the five-number summary and match each description with its corresponding value.
b Consider the following box plots.
Four box plots that possibly represent the given data set.
Which of the box plots represents the given data set?
c Which measures of center and variation best represent the data?
d Which of the following statements is most likely true about the people who enter the Italian restaurant?

Hint

a Begin by ordering the data values from least to greatest.
b Use the five-number summary to make the box plot.
c Decide whether the data is skewed or symmetric.
d Each whisker represents of the data. The box represents of the data.

Solution

a To find the five-number summary of the data set, begin by ordering the data values from least to greatest.
Notice that the minimum value is and the maximum value is Additionally, the number of data points is an even number. Therefore, the median of the data set will be given by the mean of the middle numbers and
Finally, the first quartile, or the median of the lower half, is and the third quartile is The following table summarizes this information. Each description is matched with its corresponding value.
Five Number Summary
Minimum Value
First Quartile
Median
Third Quartile
Maximum Value
b The five-number summary found in Part A can be used to draw the box plot of the data set. Start by drawing a number line that includes the minimum and maximum values of the data. Next, graph points above the number line for the five-number summary.
Points above a number line representing the five-number summary of the data set.

Now, draw a box from the quartile to the quartile. Then, draw a line through the median and the whiskers from the box to the minimum and maximum values.

The box plot of the data about the ages of attendants of an Italian Restaurant.

Notice that this plot corresponds to option A.

c In the box-plot drawn in the previous part, notice that the left whisker is longer than the right and that the median is closer to the right whisker than it is to the left.
Comparison of the distance of the median from the whiskers.
This means that the data is skewed left. In a skewed distribution, the five-number summary best describes the center and spread of the data.
d In a box plot, each whisker represents of the data and the box represents the middle of the data. With this in mind, the following facts about the distribution of the data set can be determined.
  • of the people who enter the Italian restaurant in a regular day are between and years old.
  • of the people who enter the Italian restaurant in a regular day are between and years old.
  • of the people who enter the Italian restaurant in a regular day are between and years old.

Therefore, the option that states that of the people who enter the Italian restaurant in a regular day are between and years old is the right one.

Closure

Finding Insights and Drawing Conclusions From Histograms

All the pieces to analyzing data using histograms have been covered. This method of displaying data makes it easier to find the data distribution and determine the best measures of center and variation to describe the data set. Recall the data Tadeo and Ramsha recorded about the main dishes of the restaurants in their neighborhood at the beginning of the lesson.

Average Main Dish Price (Dollars)

The two students wanted to draw some insights and conclusions based on this data. However, they were not entirely sure how to proceed with this task. Find the following information to help these curious connoisseurs!

a Consider the following histograms.
Four possible histograms. Only one correctly represents the data.
Which histogram best describes the data?
b Select the option that describes the distribution of the data.
c Which measures of center and variation will best describe the data?

Hint

a Begin by making a frequency table of the data.
b Is there a tail to the distribution? Which way does it extend?
c Which measures of center and variation should be used when the distribution of the data is skewed?

Solution

a Note that the given histograms consist of six intervals. With this in mind, make a frequency table using six intervals, starting with
Average Main Dish Price (Dollars)
Price Range Frequency
Next, use this frequency table to display the data in a histogram. The horizontal axis will be the Price Range and the vertical axis the Frequency. Then, draw the bars to represent the frequency of each interval.
A histogram showing the distribution of the average main dish price.
Notice that this corresponds to option C.
b It can be seen in the histogram that the tail of the distribution extends to the right and that most of the data is on the left.
Highlighting the tail of the distribution.
Therefore, the distribution is skewed right.
c Consider the appropriate measures of center and variation for a skewed and a symmetric distribution.
Distribution Measure of Center Measure of Variation
Symmetric Mean Standard deviation
Skewed Median Five-number summary

Because in this situation the distribution of the data is skewed, the median and the five-number summary best describe the center and variation of the data, respectively.