{{ 'ml-label-loading-course' | message }}
{{ toc.name }}
{{ toc.signature }}
{{ tocHeader }} {{ 'ml-btn-view-details' | message }}
{{ tocSubheader }}
{{ 'ml-toc-proceed-mlc' | message }}
{{ 'ml-toc-proceed-tbs' | message }}
Lesson
Exercises
Recommended
Tests
An error ocurred, try again later!
Chapter {{ article.chapter.number }}
{{ article.number }}. 

{{ article.displayTitle }}

{{ article.intro.summary }}
Show less Show more expand_more
{{ ability.description }} {{ ability.displayTitle }}
Lesson Settings & Tools
{{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }}
{{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }}
{{ 'ml-lesson-time-estimation' | message }}
Statistical displays are useful for identifying the key features of a data set. When given multiple data sets, a method for comparing them is required. This lesson will discuss how to compare multiple data sets when given appropriate statistical displays.

Catch-Up and Review

Here is a recommended reading before getting started with this lesson.

Challenge

Is Ali Misspending His Time?

It is a Saturday morning, and Ali is lying on the couch watching a movie. Ali's mom calls from afar, "Hey! Get up and do something productive." Ali responds, It is the weekend, and I should be free to dedicate time to my hobby — watching movies at home!

Ali and his mother arguing about free time

Ali is certain that everyone in his class spends a lot of time on their hobbies during weekends. He wants to prove this to his mom and decides to conduct a survey of his classmates. He sends a text message to all of his classmates and uses his gathered data to draw the following box plot.

Free Time Spent on Weekends Box Plot

After seeing the box plot, Ali's mother agrees that it is common among his classmates to spend a lot of time on hobbies during weekends. She has a request, "Actually, Ali please run another survey asking about time spent on hobbies during weekdays." Ali takes on the challenge and runs another survey. He then makes the following box plot.

Free Time Spent on Weekdays Box Plot
Is there an easy way to compare the two box plots to draw out a conclusion about the surveys?
Discussion

Comparing Two Data Sets

Inferences about two populations can be drawn by comparing their respective statistical displays. Using either a double box plot or a double dot plot are great methods. A double box plot consists of two box plots aligned on the same number line.

Double Box Plot

Likewise, a double dot plot is made from two dot plots aligned on the same number line. For these, a second number line is drawn to differentiate the data sets.

Two dot plots are shown above a number line ranging from 0 to 10. The upper plot represents data set A, while the lower plot represents data set B.

Depending on whether the data sets are symmetric or not, different measures of center and variation should be used. The following table summarizes the most appropriate measures to compare two data sets.

Both Data Sets are Symmetric Only One Data Set is Symmetric Neither Data Set is Symmetric
Measure of Center Mean Median Median
Measure of Variation Mean Absolute Deviation Interquartile Range Interquartile Range
Refer again to the two given example charts. Two transparent boxes over each data set can be moved which should help give a sense of the symmetry, or lack of symmetry, of each set. If the dark and lighter shaded regions can be moved so they have matching points, that set is symmetrical. Give it a try!
Double Dot Plot
Data Set A is symmetric, while Data Set B is not. This means that the appropriate measures to compare them are the median and the interquartile range.
Example

Waking Hours

Like most students, Ali wakes up very early on weekdays so he can get ready for school. On the weekends, though, he loves sleeping in. After all, there are no classes on the weekends!

Ali Sleeping in until 8AM on a weekend

Ali wonders if his classmates have the same habit — he predicts that they do. To test this thought, Ali runs another survey asking his classmates at which hour they wake up during weekdays. He then makes the following box plot.

Weekdays: Wake Up Time Box Plot

Following that question, he asks his classmates what time they wake up on weekends. He then creates another box plot.

Weekends Waking Hours
a Which measure of center is more appropriate to compare both data sets?
b Which measure of variation is more appropriate to compare both data sets?
c Is there enough information to find the mean of both data sets given just the box plots?

Hint

a Make a double box plot of the given data. Are both plots symmetric?
b Make a double box plot of the given data. Are both plots symmetric?
c Is the center of the box plot the median or the mean of the data set?

Solution

a Begin by making a double box plot out of the given box plots. This can be done by placing one on top of the other aligned on the same number line.
Weekends and Weekdays Waking Hours Double Box Plot

Notice that for each box plot, the median is in the middle of the box and both sides are mirror images of each other. This means that both box plots are symmetric. Therefore, the mean is the most appropriate measure to use when comparing the data sets, as shown in the table.

Both Data Sets are Symmetric Only One Data Set is Symmetric Neither Data Set is Symmetric
Measure of Center Mean Median Median
Measure of Variation Mean Absolute Deviation Interquartile Range Interquartile Range
b It was found in Part A that both data sets are symmetric, so the appropriate measure of variation is the mean absolute deviation.
Both Data Sets are Symmetric Only One Data Set is Symmetric Neither Data Set is Symmetric
Measure of Center Mean Median Median
Measure of Variation Mean Absolute Deviation Interquartile Range Interquartile Range
c A box plot is a display of the minimum, the maximum, both quartiles, and the median of a data set. However, it does not give enough information by itself to find the mean of a data set. This means the answer is No.
Example

Dining Hours

Ali has become even more curious about his classmates schedules. He wonders if there is a difference between the times his classmates have dinner on weekdays compared to weekends. He runs a survey asking his classmates at which hour they have their dinner. He obtains the following dot plot.

Weekday Dinning Times Dot Plot

What about weekends? He runs another survey and makes the following dot plot.

Weekends Dinning Times Dot Plot
a Which measure of center is more appropriate to compare both data sets?
b Which measure of spread is more appropriate to compare both data sets?
c Is there enough information to find the interquartile range of both data sets given just the dot plots?

Hint

a Make a double box plot of the given data. Are both plots symmetric?
b Make a double box plot of the given data. Are both plots symmetric?
c A dot plot represents every data point of a survey.

Solution

a Begin by making a double dot plot out of the given dot plots. This can be done by aligning both on top of each other with their respective number lines.

Note that the weekends dot plot is symmetric, but the weekdays is not. This means only one of the data sets is symmetric. That means the appropriate measure of center to compare the sets is the median.

Both Data Sets are Symmetric Only One Data Set is Symmetric Neither Data Set is Symmetric
Measure of Center Mean Median Median
Measure of Variation Mean Absolute Deviation Interquartile Range Interquartile Range
b It was found in Part A that only one data set is symmetric, so the appropriate measure of variation is the interquartile range.
Both Data Sets are Symmetric Only One Data Set is Symmetric Neither Data Set is Symmetric
Measure of Center Mean Median Median
Measure of Variation Mean Absolute Deviation Interquartile Range Interquartile Range
c A dot plot displays every data point of a survey. Since both data sets are not that large, it is possible to calculate the interquartile ranges by first finding the quartiles by hand. This means the answer is Yes.

Extra

Using Statistical Software

It was mentioned that the data sets were small enough to find the quartiles by hand, but what would happen if they were too large?

While it is technically possible to find quartiles from a data set of even thousands of points, it is not practical at all. To solve this problem, large surveys are typically ran through the help of statistical software which can calculate all the key measures of data sets in an instant.

Pop Quiz

Select the Appropriate Measure

Two box plots are given in the following applet. Select the most appropriate measure to compare them.

Discussion

Pie Chart

A pie chart is a circular chart used to represent the relative frequencies of a data set. It is also called a circle chart. These charts are divided into several slices — each representing a group of the whole data set. The following characteristics are typical of pie charts.

  • Each slice is drawn using a different color to differentiate the groups.
  • The central angle of each group, as well as its area, is proportional to its relative frequency.

A pie chart allows the visualization of each individual data group when compared to the whole. Alone, however, the chart does not give information about the frequency of each group.

Pie Chart with Side Labels

Pie charts might also include the relative frequency of each group written as a percentage. It is also possible to include labels to represent each group with matching colors.

Pie Chart with Side Labels
Discussion

Drawing a Pie Chart

A pie chart is an effective way of visualizing the proportions of different groups of data when compared to a whole. Pie charts are divided into slices, each of which represents a group and its relative frequency. The central angle of each slice can be found by multiplying each relative frequency by
As an example, consider a survey of a group of people's favorite ice cream flavors. Suppose that people took part in the survey. Eight people responded that they prefer chocolate, six prefer vanilla, and six prefer other flavors. The following steps can be used to create a pie chart representing this survey.
1
Make a Frequency Table
expand_more

Begin by identifying each group of the survey. In this case, there are three different flavor groups: chocolate, vanilla, and other. These three groups have eight, six, and six people, respectively.

Flavor Frequency
Chocolate
Vanilla
Other
2
Find the Relative Frequency of Each Group
expand_more

A total of people took part of the survey, so each frequency will be divided by this value to find the relative frequency of each group.

Flavor Frequency Relative Frequency
Chocolate
Vanilla
Other
3
Find the Central Angle of Each Slice
expand_more
To find the central angle of each slice multiply the relative frequency of its group by
This needs to be done for every different group of the survey.
Flavor Frequency Relative Frequency Central Angle
Chocolate
Vanilla
Other
4
Draw the Slices
expand_more

Begin by drawing a circle to represent the whole population of the survey.

Circle Representing Whole Population

Next, draw a radius to select a starting point. This can be any radius of the circle.

Circle with Starting Radius

Align the protractor with the starting radius and mark the central angle corresponding to the first group, which is

Protractor Placed on the Starting Radius Marking 144 degrees

Draw the radius that passes through the previous mark. The slice of the first group is now ready.

First slice of the pie chart

To draw the slice of the next group place the protractor at the end of the previous group and mark the next central angle.

Protractor Placed on the end of the previous slice marking 108 degrees

Draw the radius that passes through this mark to obtain the second slice. Repeat this process until every slice is drawn.

The pie chart divided into three parts with central angles of 108 degrees, 144 degrees, and 108 degrees.
Make sure that the sum of the central angles is equal to
Be aware that the central angles are not typically shown on the final diagram.
Slices of the pie chart
5
Add the Details to the Graph
expand_more

Finally, color and label each slice of the graph.

The pie chart displays three categories: chocolate, vanilla, and other. Chocolate seems to have the largest area, while other and vanilla have approximately equal areas.

Side labels and relative frequencies written as percentages might also be added in this step.

The pie chart displays three categories: chocolate, vanilla, and other. Chocolate makes up 40 percent of the pie chart whereas both vanilla and other make up 30 percent of the pie chart.
Example

How Do You Spend Your Free Time?

Ali is really enjoying making surveys to learn about the habits of his classmates. Previously, he asked them how much time they spend doing their hobbies. Now he wants to know what those hobbies are.

Hobby Frequency
Watching Movies/Series
Videogames
Reading
Sports
It is clear that Ali is not the only one who loves to watch series on the couch! Help Ali get a clearer visualization of the data by making a pie chart.

Answer

Pie chart showing Ali's classmates' habits. Purple represents watching movies/series with 40%, green represents playing video games with 30%, red represents reading with 20%, and yellow represents playing sports with 10%.

Hint

Add a relative frequency column to the given table. Find the central angle of each slice by multiplying and the relative frequency of each group.

Solution

A table with relative frequencies is required in order to make a pie chart. Add a relative frequency column to the given table to do so.

Hobby Frequency Relative Frequency
Watching Movies/Series
Videogames
Reading
Sports
Now add all the frequencies to find the total population of the survey.
Divide each frequency by this total to find each relative frequency.
Hobby Frequency Relative Frequency
Watching Movies/Series
Videogames
Reading
Sports

Since there are four types of hobbies, the pie chart will be divided into four slices. To find the central angle of each slice, the relative frequency of each group will be multiplied by Insert another column called central angle. This helps see the information in a clear and organized way.

Hobby Frequency Relative Frequency Central Angle
Watching Movies/Series
Videogames
Reading
Sports

Now that every central angle has been found, proceed to draw the graph. Make a circle using a compass and select a starting radius.

Circle and Starting Radius

Align the protractor with the starting radius and mark the central angle corresponding to the first group, which is

Protractor Placed on the Starting Radius Marking 144 degrees

Draw a radius that passes through this mark. This slice corresponds to the people who like watching movies or series.

Repeat this process for each group. Make sure that the zero of the protractor is placed on the previously drawn radius.
Finally, add colors and label each slice of the chart.
Pie chart showing Ali's classmates' habits. Purple represents watching movies/series with 40%, green represents playing video games with 30%, red represents reading with 20%, and yellow represents playing sports with 10%.


Discussion

Selecting an Appropriate Display

Pie charts are useful for comparing parts of the data that fall under the same group versus the whole data. The following table summarizes the main types of data displays and what are their best uses.

Data Display Best Uses
Bar Graph Show the frequency of categorical data
Box Plot Show the measures of variation of a data set
Pie Chart Compare different groups of the data to the whole
Histogram Divide numerical data into intervals and show their frequency
Line Graph Show how data changes over a period of time
Dot Plot Display the frequency of numerical data
Always keep in mind that data can often be displayed in different ways. Visual representations depend on what type of data is being worked with and what type of information is desired to be explained.
Example

What Kind of Movies do You Like?

Ali is so stoked to see that so many of his classmates share the same hobby of watching movies. He goes ahead and makes a follow-up survey asking his classmates what their favorite movie genre is. He is able to make the following frequencies table.

Favorite Genre Frequency
Action
Drama
Thriller
Comedy
Horror
Fantasy
Ali wants to thank his classmates for taking so much time to answer his surveys that he now wants to display the data in a fancy way. There are so many displays to choose from! Which of the following displays can not be used to display this data?

Hint

Is Ali working with categorical or numerical data?

Solution

Begin by making sense of what type of data is Ali working with. Since movie genres are used as categories to describe movies, Ali deals with categorical data. Now examine the table, which shows data displays as well as the associated data types they are used with.

Data Display Type of Data
Box Plot Numerical
Pie Chart Categorical
Bar Chart Categorical
Dot Plot Numerical
Histogram Numerical

Among the data display options, box plots, dot plots, and histograms are designed for representing numerical data. This means that these three types of data displays cannot be used to visualize Ali's survey data.


Closure

Weekends Vs. Weekdays Hobby Time

Ali ran a survey asking about the amount of time his classmates spend on their hobbies. He created the following box plot concerning weekends.

Free Time Spent on Weekends Box Plot

He made a different box plot for time spent on hobbies during weekdays.

Recall how to compare the box plots. It can be done by drawing one of them on top of the other, aligned with the same number line.

Note that the box plot corresponding to weekends is not symmetric, while the box plot for weekdays is symmetric. In other words, only one of the data sets is symmetric. Considering the following table, the median and the interquartile range are the appropriate measures for comparing the data.

Both Data Sets are Symmetric Only One Data Set is Symmetric Neither Data Set is Symmetric
Measure of Center Mean Median Median
Measure of Variation Mean Absolute Deviation Interquartile Range Interquartile Range

Comparing Medians

Remember that in box plots, the vertical line within the rectangular box represents the location of the median. The median for the weekdays is then and the median for weekend is

Ali and his classmates spend about hours on their hobbies during the weekend. This is twice as much as the hours they spend during weekdays. They can spend more time on their hobbies because they do not have to go to school on weekends.

Comparing Interquartile Ranges

The interquartile range for the weekdays is hour, and for the weekends it is hours.

There is lesser variation in the data for weekdays. This could suggest that most of his classmates spend less time on hobbies during the weekdays, which could mean they have more scheduled activities. It is important to note that not everyone spends all their time on hobbies during the weekends. They might have other things to do!
Loading content